Gibberish or blank completions in both swagger interface and vscode #1277

rbollampally · 2024-01-23T10:21:31Z

Describe the bug
I'm trying to run tabby via following script:

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/CodeLlama-13B --device cuda
I have 4 x 3090 machine. Even if I limit tabby to one GPU, I'm getting following:

Request:

curl -X 'POST' \
  'http://192.168.68.66:8080/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Response (200):

{
  "id": "cmpl-f79069d0-fa5f-41b4-aa06-50eb7015409f",
  "choices": [
    {
      "index": 0,
      "text": "fte▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅"
    }
  ]
}

Events log:

{"ts":1706002279828,"event":{"completion":{"completion_id":"cmpl-f79069d0-fa5f-41b4-aa06-50eb7015409f","language":"python","prompt":"<PRE> def fib(n):\n <SUF>\n return fib(n - 1) + fib(n - 2) <MID>","segments":{"prefix":"def fib(n):\n ","suffix":"\n return fib(n - 1) + fib(n - 2)","clipboard":null},"choices":[{"index":0,"text":"fte▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅"}],"user":null}}}

Information about your version
Please provide output of tabby --version

tabby 0.7.0

Information about your GPU
Please provide output of nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0 Off |                  N/A |
|  0%   25C    P8              23W / 350W |   7608MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  | 00000000:81:00.0 Off |                  N/A |
|  0%   28C    P8              23W / 350W |   4046MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        On  | 00000000:82:00.0 Off |                  N/A |
|  0%   28C    P8              23W / 350W |   4046MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        On  | 00000000:C1:00.0 Off |                  N/A |
|  0%   27C    P8              23W / 350W |   4046MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
|    1   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
|    2   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
|    3   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
+---------------------------------------------------------------------------------------+

I have also tried with indexing a github repository

# Index three repositories' source code as additional context for code completion.

[[repositories]]
name = "Autogen"
git_url = "https://github.com/microsoft/autogen.git"

here is completion event log after:
{"ts":1706005076127,"event":{"completion":{"completion_id":"cmpl-6b807cf1-422d-4428-a2e1-aae93ea89c75","language":"python","prompt":"<PRE> # Path: samples/apps/autogen-studio/autogenstudio/chatmanager.py\n# print(\"Modified files: \", len(modified_files))\n#\n# Path: samples/apps/autogen-studio/autogenstudio/chatmanager.py\n# Message(\n# user_id=message.user_id,\n# root_msg_id=message.root_msg_id,\n# role=\"assistant\",\n# content=output,\n# metadata=json.dumps(metadata),\n# session_id=message.session_id,\n# )\n#\n# Path: samples/apps/autogen-studio/autogenstudio/utils/dbutils.py\n# sqlite3.connect(self.path, check_same_thread=False, **kwargs)\n#\n# Path: samples/apps/autogen-studio/autogenstudio/utils/dbutils.py\n# def reset_db(self):\n# \"\"\"\n# Reset the database by deleting the database file and creating a new one.\n# \"\"\"\n# print(\"resetting db\")\n# if os.path.exists(self.path):\n# os.remove(self.path)\n# self.init_db(path=self.path)\n teachability.add_to_agent(teachable_agent)\n\n return teachable_agent\n\n\ndef interact_freely_with_user():\n \"\"\"Starts a free-form chat between the user and a teachable agent.\"\"\"\n\n # Create the agents.\n print(colored(\"\\nLoading previous memory (if any) from disk.\", \"light_cyan\"))\n teachable_agent = create_teachable_agent(reset_db=False)\n user = UserProxyAgent(\"user\", human_input_mode=\"ALWAYS\")\n\n # Start the chat.\n teachable_agent.initiate_chat(user, message=\"Greetings, I'm a teachable user assistant! What's on your mind today?\")\n\n\nif __name__ == \"__main__\":\n \"\"\"Lets the user test a teachable agent interactively.\"\"\"\n <SUF>\n <MID>","segments":{"prefix":" teachability.add_to_agent(teachable_agent)\n\n return teachable_agent\n\n\ndef interact_freely_with_user():\n \"\"\"Starts a free-form chat between the user and a teachable agent.\"\"\"\n\n # Create the agents.\n print(colored(\"\\nLoading previous memory (if any) from disk.\", \"light_cyan\"))\n teachable_agent = create_teachable_agent(reset_db=False)\n user = UserProxyAgent(\"user\", human_input_mode=\"ALWAYS\")\n\n # Start the chat.\n teachable_agent.initiate_chat(user, message=\"Greetings, I'm a teachable user assistant! What's on your mind today?\")\n\n\nif __name__ == \"__main__\":\n \"\"\"Lets the user test a teachable agent interactively.\"\"\"\n ","suffix":"","clipboard":null},"choices":[{"index":0,"text":"#ogormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormscore agrprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprime"}],"user":null}}}

The text was updated successfully, but these errors were encountered:

icycodes · 2024-01-24T04:18:20Z

I tried looking into this problem, but I can't reproduce it in my environment. I can get the response properly with the Tabby/CodeLlama-13B model, both 0.7.0 and 0.6.0 tested.

Completion request:

curl -X 'POST' \
  'http://localhost:8080/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Completion response:

{
  "id": "cmpl-9196ff68-1555-4bd8-84cd-b391d7167885",
  "choices": [
    {
      "index": 0,
      "text": "if n <= 1:\n        return n\n    else:"
    }
  ]
}

Health check response

{
  "model": "TabbyML/CodeLlama-13B",
  "device": "cuda",
  "arch": "x86_64",
  "cpu_info": "13th Gen Intel(R) Core(TM) i7-13700KF",
  "cpu_count": 24,
  "cuda_devices": [
    "NVIDIA GeForce RTX 4090"
  ],
  "version": {
    "build_date": "2023-12-15",
    "build_timestamp": "2023-12-15T05:54:46.222708135Z",
    "git_sha": "c3db6d829f3125db8c49552c0425dde174bc6649",
    "git_describe": "v0.7.0"
  }
}

output of nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0  On |                  Off |
|  0%   51C    P2              80W / 450W |  20902MiB / 24564MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2940      G   /usr/lib/xorg/Xorg                         1317MiB |
|    0   N/A  N/A      3275      G   /usr/bin/gnome-shell                        584MiB |
|    0   N/A  N/A      3972      G   /proc/self/exe                              220MiB |
|    0   N/A  N/A      6046      G   ...sion,SpareRendererForSitePerProcess      895MiB |
|    0   N/A  N/A     10915      C   /opt/tabby/bin/tabby                      17788MiB |
+---------------------------------------------------------------------------------------+

rbollampally · 2024-01-24T16:12:16Z

I compiled it from source today and it is working well. Thanks. And please merge #1286 ASAP. A single line of code killed my whole day :D

rbollampally added the bug-unconfirmed label Jan 23, 2024

wsxiaoys closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gibberish or blank completions in both swagger interface and vscode #1277

Gibberish or blank completions in both swagger interface and vscode #1277

rbollampally commented Jan 23, 2024 •

edited

Loading

icycodes commented Jan 24, 2024

rbollampally commented Jan 24, 2024

Gibberish or blank completions in both swagger interface and vscode #1277

Gibberish or blank completions in both swagger interface and vscode #1277

Comments

rbollampally commented Jan 23, 2024 • edited Loading

icycodes commented Jan 24, 2024

rbollampally commented Jan 24, 2024

rbollampally commented Jan 23, 2024 •

edited

Loading