Dual 3090 AI Inference Workstation

Videos

DUAL 3090 AI Inference Workstation - https://www.youtube.com/watch?v=3sdmkrcmZw0

Hardware

Power Supply: Corsair 1000W SFF-L *Afilliate Link
Warning I'm getting a bit of coil whine under load with this PSU and graphics card combination
Motherboard: BD790i
CPU: CPU AMD Ryzen 9 7945HX 5.4 GHz - 16 Cores 32 Threads *Built into motherboard
RAM: Crucial 5600 96GB Kit *Affiliate Link
Case: Geometric Future Model 8 *Affiliate Link
Storage: Crucial T705 4TB Gen 5 NVME *Affiliate Link
GPU: 2x Nvidia 3090 Founders Edition *Purchased Used on Ebay
Network Adapters:
- 2.5Gbps Realtek NIC (built into motherboard)
- (Optional) 10Gbps AQC107 NIC in nvme form factor
  *No Driver needed on rocky linux 9, autodetects as Aquantia Ethernet
Cooling:
- 5 x Noctua NF-A12x25 *Affiliate Link
- 3 x Noctua NF-A14 *Affiliate Link
- 1 x Noctua NF-A12x15 *Affiliate Link
Screws:
- M2.5 Screws for Minisforum CPU Fan Bracket *Affiliate Link
- (Optional) Misc Assorted PC Screws *Affiliate Link
Cables
- 1 x JMT PCI-E 4.0 x16 1 to 2 PCIe Bifurcation *Affiliate Link
- 1 x PCIE 4.0 Extension Cable Length 250mm *Affiliate Link
- 1 x EZDIY-FAB Vertical GPU Mount with High-Speed PCIE 4.0 Riser Cable *Affiliate Link
- 2 x 6pin + 2pin PCIe Power Extension Cables *Affiliate Link
- 1 x USB 3.1 Type-E to Type C USB3.0 Motherboard Header Adapter Male to Female
- 1 x USB C Extension Cable *Affiliate Link
- 1 x USB Connector USB Extension Cable USB2.0 to 9Pin Conector 9 Pin Male to External USB A

Software

AI Text Generation: llama.cpp
Editor: VSCode
Editor Extensions:
- LLM Integration Continue

Configuration

Install any Nvidia Drivers and Cuda Dependencies:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
Reboot

Disable Xorg binding to nvidia cards:

You will find that xorg places processes on your GPUs rather than using the iGPU on our CPU. This can cause out of memory errors when running AI workloads.

To prevent this you can comment out the x-org configuration found within /etc/X11/xorg.conf.d/ This will cause x-org not to see the nvidia driver and therefore it won't use it for window management

#Section "OutputClass"
#    Identifier "nvidia"
#    MatchDriver "nvidia-drm"
#    Driver "nvidia"
#    Option "AllowEmptyInitialConfiguration"
#    Option "PrimaryGPU" "no"
#    Option "SLI" "Auto"
#    Option "BaseMosaic" "on"
#EndSection

Section "OutputClass"
    Identifier "intel"
    MatchDriver "i915"
    Driver "modesetting"
EndSection

Install Llama.cpp

clone llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
Reference: https://github.com/ggerganov/llama.cpp
Compile llama.cpp with nvidia support

export PATH=$PATH:/usr/local/cuda-12.5/bin
make LLAMA_CUDA=1

Load the models

Download Mixtral 8x7B Instruct GGUF quant
https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/blob/main/mixtral-8x7b-v0.1.Q4_K_M.gguf Reference: https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/tree/main

Download Dolphin Starcoder2 7B quant:
https://huggingface.co/bartowski/dolphincoder-starcoder2-7b-GGUF/resolve/main/dolphincoder-starcoder2-7b-Q6_K.gguf?download=true reference: https://huggingface.co/bartowski/dolphincoder-starcoder2-7b-GGUF/tree/main

Place the models into the llama cpp models folder

Start the llama.cpp server instances

Start the Instruct Server

./llama-server --port 8080 -m models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf -ngl 99

Start the Autocomplete Server

./llama-server --port 8081 -m models/dolphincoder-starcoder2-7b-Q6_K.gguf -ngl 99

Configure VScode

Install VScode
VSCode

Install VScode Extension "Continue"
https://github.com/continuedev/continue

Configure continue
Open the configure configuration using the vscode command Palette
Reference: https://docs.continue.dev/reference/Model%20Providers/llamacpp

{
  "models": [
    {
      "title": "Mixtral 8x7B",
      "provider": "llama.cpp",
      "model": "mistral-8x7b",
      "apiBase": "http://localhost:8080",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses. if asked to write something like a function, comment or docblock wrap it in code ticks for easy copy paste"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Dolphin Starcoder2",
    "provider": "llama.cpp",
    "model": "starcoder2:7b",
    "apiBase": "http://localhost:8081",
    "useCopyBuffer": false,
    "maxPromptTokens": 4000,
    "prefixPercentage": 0.5,
    "multilineCompletions": "always",
    "debounceDelay": 150
  },
  "allowAnonymousTelemetry": false
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Videos

Hardware

Software

Configuration

Install any Nvidia Drivers and Cuda Dependencies:

Disable Xorg binding to nvidia cards:

Install Llama.cpp

Load the models

Start the llama.cpp server instances

Configure VScode

Files

README.md

Latest commit

History

README.md

File metadata and controls

Videos

Hardware

Software

Configuration

Install any Nvidia Drivers and Cuda Dependencies:

Disable Xorg binding to nvidia cards:

Install Llama.cpp

Load the models

Start the llama.cpp server instances

Configure VScode