Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibberish or blank completions in both swagger interface and vscode #1277

Closed
rbollampally opened this issue Jan 23, 2024 · 2 comments
Closed

Comments

@rbollampally
Copy link

rbollampally commented Jan 23, 2024

Describe the bug
I'm trying to run tabby via following script:

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/CodeLlama-13B --device cuda
I have 4 x 3090 machine. Even if I limit tabby to one GPU, I'm getting following:

Request:

curl -X 'POST' \
  'http://192.168.68.66:8080/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Response (200):

{
  "id": "cmpl-f79069d0-fa5f-41b4-aa06-50eb7015409f",
  "choices": [
    {
      "index": 0,
      "text": "fte▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅"
    }
  ]
}

Events log:

{"ts":1706002279828,"event":{"completion":{"completion_id":"cmpl-f79069d0-fa5f-41b4-aa06-50eb7015409f","language":"python","prompt":"<PRE> def fib(n):\n <SUF>\n return fib(n - 1) + fib(n - 2) <MID>","segments":{"prefix":"def fib(n):\n ","suffix":"\n return fib(n - 1) + fib(n - 2)","clipboard":null},"choices":[{"index":0,"text":"fte▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅"}],"user":null}}}

Information about your version
Please provide output of tabby --version

tabby 0.7.0

Information about your GPU
Please provide output of nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        On  | 00000000:01:00.0 Off |                  N/A |
|  0%   25C    P8              23W / 350W |   7608MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        On  | 00000000:81:00.0 Off |                  N/A |
|  0%   28C    P8              23W / 350W |   4046MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce RTX 3090        On  | 00000000:82:00.0 Off |                  N/A |
|  0%   28C    P8              23W / 350W |   4046MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce RTX 3090        On  | 00000000:C1:00.0 Off |                  N/A |
|  0%   27C    P8              23W / 350W |   4046MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
|    1   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
|    2   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
|    3   N/A  N/A         1      C   /opt/tabby/bin/tabby                          0MiB |
+---------------------------------------------------------------------------------------+

I have also tried with indexing a github repository

# Index three repositories' source code as additional context for code completion.

[[repositories]]
name = "Autogen"
git_url = "https://github.com/microsoft/autogen.git"


here is completion event log after:
{"ts":1706005076127,"event":{"completion":{"completion_id":"cmpl-6b807cf1-422d-4428-a2e1-aae93ea89c75","language":"python","prompt":"<PRE> # Path: samples/apps/autogen-studio/autogenstudio/chatmanager.py\n# print(\"Modified files: \", len(modified_files))\n#\n# Path: samples/apps/autogen-studio/autogenstudio/chatmanager.py\n# Message(\n# user_id=message.user_id,\n# root_msg_id=message.root_msg_id,\n# role=\"assistant\",\n# content=output,\n# metadata=json.dumps(metadata),\n# session_id=message.session_id,\n# )\n#\n# Path: samples/apps/autogen-studio/autogenstudio/utils/dbutils.py\n# sqlite3.connect(self.path, check_same_thread=False, **kwargs)\n#\n# Path: samples/apps/autogen-studio/autogenstudio/utils/dbutils.py\n# def reset_db(self):\n# \"\"\"\n# Reset the database by deleting the database file and creating a new one.\n# \"\"\"\n# print(\"resetting db\")\n# if os.path.exists(self.path):\n# os.remove(self.path)\n# self.init_db(path=self.path)\n teachability.add_to_agent(teachable_agent)\n\n return teachable_agent\n\n\ndef interact_freely_with_user():\n \"\"\"Starts a free-form chat between the user and a teachable agent.\"\"\"\n\n # Create the agents.\n print(colored(\"\\nLoading previous memory (if any) from disk.\", \"light_cyan\"))\n teachable_agent = create_teachable_agent(reset_db=False)\n user = UserProxyAgent(\"user\", human_input_mode=\"ALWAYS\")\n\n # Start the chat.\n teachable_agent.initiate_chat(user, message=\"Greetings, I'm a teachable user assistant! What's on your mind today?\")\n\n\nif __name__ == \"__main__\":\n \"\"\"Lets the user test a teachable agent interactively.\"\"\"\n <SUF>\n <MID>","segments":{"prefix":" teachability.add_to_agent(teachable_agent)\n\n return teachable_agent\n\n\ndef interact_freely_with_user():\n \"\"\"Starts a free-form chat between the user and a teachable agent.\"\"\"\n\n # Create the agents.\n print(colored(\"\\nLoading previous memory (if any) from disk.\", \"light_cyan\"))\n teachable_agent = create_teachable_agent(reset_db=False)\n user = UserProxyAgent(\"user\", human_input_mode=\"ALWAYS\")\n\n # Start the chat.\n teachable_agent.initiate_chat(user, message=\"Greetings, I'm a teachable user assistant! What's on your mind today?\")\n\n\nif __name__ == \"__main__\":\n \"\"\"Lets the user test a teachable agent interactively.\"\"\"\n ","suffix":"","clipboard":null},"choices":[{"index":0,"text":"#ogormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormormscore agrprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprimeprime"}],"user":null}}}

@icycodes
Copy link
Member

I tried looking into this problem, but I can't reproduce it in my environment. I can get the response properly with the Tabby/CodeLlama-13B model, both 0.7.0 and 0.6.0 tested.

Completion request:

curl -X 'POST' \
  'http://localhost:8080/v1/completions' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "language": "python",
  "segments": {
    "prefix": "def fib(n):\n    ",
    "suffix": "\n        return fib(n - 1) + fib(n - 2)"
  }
}'

Completion response:

{
  "id": "cmpl-9196ff68-1555-4bd8-84cd-b391d7167885",
  "choices": [
    {
      "index": 0,
      "text": "if n <= 1:\n        return n\n    else:"
    }
  ]
}

Health check response

{
  "model": "TabbyML/CodeLlama-13B",
  "device": "cuda",
  "arch": "x86_64",
  "cpu_info": "13th Gen Intel(R) Core(TM) i7-13700KF",
  "cpu_count": 24,
  "cuda_devices": [
    "NVIDIA GeForce RTX 4090"
  ],
  "version": {
    "build_date": "2023-12-15",
    "build_timestamp": "2023-12-15T05:54:46.222708135Z",
    "git_sha": "c3db6d829f3125db8c49552c0425dde174bc6649",
    "git_describe": "v0.7.0"
  }
}

output of nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0  On |                  Off |
|  0%   51C    P2              80W / 450W |  20902MiB / 24564MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2940      G   /usr/lib/xorg/Xorg                         1317MiB |
|    0   N/A  N/A      3275      G   /usr/bin/gnome-shell                        584MiB |
|    0   N/A  N/A      3972      G   /proc/self/exe                              220MiB |
|    0   N/A  N/A      6046      G   ...sion,SpareRendererForSitePerProcess      895MiB |
|    0   N/A  N/A     10915      C   /opt/tabby/bin/tabby                      17788MiB |
+---------------------------------------------------------------------------------------+

@rbollampally
Copy link
Author

I compiled it from source today and it is working well. Thanks. And please merge #1286 ASAP. A single line of code killed my whole day :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants