Skip to content

Latest commit

 

History

History
157 lines (123 loc) · 5.89 KB

README.md

File metadata and controls

157 lines (123 loc) · 5.89 KB

Dual 3090 AI Inference Workstation

Videos

Hardware

Software

Configuration

Install any Nvidia Drivers and Cuda Dependencies:

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/
Reboot

Disable Xorg binding to nvidia cards:

You will find that xorg places processes on your GPUs rather than using the iGPU on our CPU. This can cause out of memory errors when running AI workloads.

To prevent this you can comment out the x-org configuration found within /etc/X11/xorg.conf.d/ This will cause x-org not to see the nvidia driver and therefore it won't use it for window management

#Section "OutputClass"
#    Identifier "nvidia"
#    MatchDriver "nvidia-drm"
#    Driver "nvidia"
#    Option "AllowEmptyInitialConfiguration"
#    Option "PrimaryGPU" "no"
#    Option "SLI" "Auto"
#    Option "BaseMosaic" "on"
#EndSection

Section "OutputClass"
    Identifier "intel"
    MatchDriver "i915"
    Driver "modesetting"
EndSection

Install Llama.cpp

clone llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git
Reference: https://github.com/ggerganov/llama.cpp
Compile llama.cpp with nvidia support

export PATH=$PATH:/usr/local/cuda-12.5/bin
make LLAMA_CUDA=1

Load the models

Download Mixtral 8x7B Instruct GGUF quant
https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/blob/main/mixtral-8x7b-v0.1.Q4_K_M.gguf Reference: https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF/tree/main

Download Dolphin Starcoder2 7B quant:
https://huggingface.co/bartowski/dolphincoder-starcoder2-7b-GGUF/resolve/main/dolphincoder-starcoder2-7b-Q6_K.gguf?download=true reference: https://huggingface.co/bartowski/dolphincoder-starcoder2-7b-GGUF/tree/main

Place the models into the llama cpp models folder

Start the llama.cpp server instances

Start the Instruct Server

./llama-server --port 8080 -m models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf -ngl 99

Start the Autocomplete Server

./llama-server --port 8081 -m models/dolphincoder-starcoder2-7b-Q6_K.gguf -ngl 99

Configure VScode

Install VScode
VSCode

Install VScode Extension "Continue"
https://github.com/continuedev/continue

Configure continue
Open the configure configuration using the vscode command Palette
Reference: https://docs.continue.dev/reference/Model%20Providers/llamacpp

{
  "models": [
    {
      "title": "Mixtral 8x7B",
      "provider": "llama.cpp",
      "model": "mistral-8x7b",
      "apiBase": "http://localhost:8080",
      "systemMessage": "You are an expert software developer. You give helpful and concise responses. if asked to write something like a function, comment or docblock wrap it in code ticks for easy copy paste"
    }
  ],
  "customCommands": [
    {
      "name": "test",
      "prompt": "{{{ input }}}\n\nWrite a comprehensive set of unit tests for the selected code. It should setup, run tests that check for correctness including important edge cases, and teardown. Ensure that the tests are complete and sophisticated. Give the tests just as chat output, don't edit any file.",
      "description": "Write unit tests for highlighted code"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Dolphin Starcoder2",
    "provider": "llama.cpp",
    "model": "starcoder2:7b",
    "apiBase": "http://localhost:8081",
    "useCopyBuffer": false,
    "maxPromptTokens": 4000,
    "prefixPercentage": 0.5,
    "multilineCompletions": "always",
    "debounceDelay": 150
  },
  "allowAnonymousTelemetry": false
}