docs(api): update model descriptions for previous vendors (#3747)

* docs(api): update model descriptions and configuration examples for various AI providers * docs(api): clarify API endpoint usage and model specifications in documentation
TabbyML · Jan 23, 2025 · 5b01a28 · 5b01a28
1 parent 56c25c1
commit 5b01a28
Show file tree

Hide file tree

Showing 9 changed files with 145 additions and 97 deletions.
diff --git a/website/docs/references/models-http-api/deepseek.md b/website/docs/references/models-http-api/deepseek.md
@@ -1,23 +1,31 @@
 # DeepSeek
 
-[DeepSeek](https://www.deepseek.com/) offers a suite of AI models, such as [DeepSeek V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) and [DeepSeek Coder](https://huggingface.co/collections/deepseek-ai/deepseekcoder-v2-666bf4b274a5f556827ceeca), which perform well in coding tasks. Tabby supports DeepSeek's models for both code completion and chat.
+[DeepSeek](https://www.deepseek.com/) is an AI company that develops large language models specialized in coding and general tasks. Their models include [DeepSeek V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) for general tasks and [DeepSeek Coder](https://huggingface.co/collections/deepseek-ai/deepseekcoder-v2-666bf4b274a5f556827ceeca) specifically optimized for programming tasks.
 
-Below is an example
+## Chat model
+
+DeepSeek provides an OpenAI-compatible chat API interface.
 
 ```toml title="~/.tabby/config.toml"
-# Chat model configuration
 [model.chat.http]
-# Deepseek's chat interface is compatible with OpenAI's chat API.
 kind = "openai/chat"
 model_name = "your_model"
 api_endpoint = "https://api.deepseek.com/v1"
-api_key = "secret-api-key"
+api_key = "your-api-key"
+```
+
+## Completion model
+
+DeepSeek offers a specialized completion API interface for code completion tasks.
 
-# Completion model configuration
+```toml title="~/.tabby/config.toml"
 [model.completion.http]
-# Deepseek uses its own completion API interface.
 kind = "deepseek/completion"
 model_name = "your_model"
 api_endpoint = "https://api.deepseek.com/beta"
-api_key = "secret-api-key"
+api_key = "your-api-key"
 ```
+
+## Embeddings model
+
+DeepSeek currently does not provide embedding model APIs.
diff --git a/website/docs/references/models-http-api/jan.ai.md b/website/docs/references/models-http-api/jan.ai.md
@@ -1,20 +1,23 @@
 # Jan AI
 
-[Jan](https://jan.ai/) is an open-source alternative to ChatGPT that runs entirely offline on your computer.
+[Jan](https://jan.ai/) is an open-source alternative to ChatGPT that runs entirely offline on your computer. It provides an OpenAI-compatible server interface that can be enabled through the Jan App's `Local API Server` UI.
 
-Jan can run a server that provides an OpenAI-equivalent chat API at https://localhost:1337,
-allowing us to use the OpenAI kinds for chat.
-To use the Jan Server, you need to enable it in the Jan App's `Local API Server` UI.
+## Chat model
 
-However, Jan does not yet provide API support for completion and embeddings.
-
-Below is an example for chat:
+Jan provides an OpenAI-compatible chat API interface.
 
 ```toml title="~/.tabby/config.toml"
-# Chat model
 [model.chat.http]
 kind = "openai/chat"
 model_name = "your_model"
 api_endpoint = "http://localhost:1337/v1"
 api_key = ""
-```
+```
+
+## Completion model
+
+Jan currently does not provide completion API support.
+
+## Embeddings model
+
+Jan currently does not provide embedding API support.
diff --git a/website/docs/references/models-http-api/llama.cpp.md b/website/docs/references/models-http-api/llama.cpp.md
@@ -1,22 +1,33 @@
 # llama.cpp
 
-[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models.
+[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. It provides a server implementation that supports completion, chat, and embedding functionalities through HTTP APIs.
 
-Tabby supports the llama.cpp HTTP API for completion, chat, and embedding models.
+## Chat model
+
+llama.cpp provides an OpenAI-compatible chat API interface.
+
+```toml title="~/.tabby/config.toml"
+[model.chat.http]
+kind = "openai/chat"
+api_endpoint = "http://localhost:8888"
+```
+
+## Completion model
+
+llama.cpp offers a specialized completion API interface for code completion tasks.
 
 ```toml title="~/.tabby/config.toml"
-# Completion model
 [model.completion.http]
 kind = "llama.cpp/completion"
 api_endpoint = "http://localhost:8888"
 prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series.
+```
 
-# Chat model
-[model.chat.http]
-kind = "openai/chat"
-api_endpoint = "http://localhost:8888"
+## Embeddings model
+
+llama.cpp provides embedding functionality through its HTTP API.
 
-# Embedding model
+```toml title="~/.tabby/config.toml"
 [model.embedding.http]
 kind = "llama.cpp/embedding"
 api_endpoint = "http://localhost:8888"

diff --git a/website/docs/references/models-http-api/llamafile.md b/website/docs/references/models-http-api/llamafile.md
@@ -1,38 +1,41 @@
 # llamafile
 
-[llamafile](https://github.com/Mozilla-Ocho/llamafile)
-is a Mozilla Builders project that allows you to distribute and run LLMs with a single file.
+[llamafile](https://github.com/Mozilla-Ocho/llamafile) is a Mozilla Builders project that allows you to distribute and run LLMs with a single file. It embeds a llama.cpp server and provides an OpenAI API-compatible chat-completions endpoint, allowing us to use the `openai/chat`, `llama.cpp/completion`, and `llama.cpp/embedding` types.
 
-llamafile embeds a llama.cpp server and provides an OpenAI API-compatible chat-completions endpoint,
-allowing us to use the `openai/chat`, `llama.cpp/completion`, and `llama.cpp/embedding` types.
+By default, llamafile uses port `8080`, which conflicts with Tabby's default port. It is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`. For embeddings functionality, you need to run llamafile with both the `--embedding` and `--port` options.
 
-By default, llamafile uses port `8080`, which is also used by Tabby.
-Therefore, it is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`.
+## Chat model
 
-For embeddings, the embedding endpoint is no longer supported in the standard llamafile server,
-so you need to run llamafile with the `--embedding` and `--port` options.
-
-Below is an example configuration:
+llamafile provides an OpenAI-compatible chat API interface. Note that the endpoint URL must include the `v1` suffix.
 
 ```toml title="~/.tabby/config.toml"
-# Chat model
 [model.chat.http]
 kind = "openai/chat"  # llamafile uses openai/chat kind
 model_name = "your_model"
 api_endpoint = "http://localhost:8081/v1"  # Please add and conclude with the `v1` suffix
 api_key = ""
+```
+
+## Completion model
 
-# Completion model
+llamafile uses llama.cpp's completion API interface. Note that the endpoint URL should NOT include the `v1` suffix.
+
+```toml title="~/.tabby/config.toml"
 [model.completion.http]
-kind = "llama.cpp/completion"   # llamafile uses llama.cpp/completion kind
+kind = "llama.cpp/completion"
 model_name = "your_model"
 api_endpoint = "http://localhost:8081"  # DO NOT append the `v1` suffix
 api_key = "secret-api-key"
 prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>" # Example prompt template for the Qwen2.5 Coder model series.
+```
+
+## Embeddings model
 
-# Embedding model
+llamafile provides embedding functionality through llama.cpp's API interface. Note that the endpoint URL should NOT include the `v1` suffix.
+
+```toml title="~/.tabby/config.toml"
 [model.embedding.http]
-kind = "llama.cpp/embedding"  # llamafile uses llama.cpp/embedding kind
+kind = "llama.cpp/embedding"
 model_name = "your_model"
 api_endpoint = "http://localhost:8082"  # DO NOT append the `v1` suffix
 api_key = ""

diff --git a/website/docs/references/models-http-api/mistral-ai.md b/website/docs/references/models-http-api/mistral-ai.md
@@ -1,21 +1,31 @@
 # Mistral AI
 
-[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models. Tabby supports Mistral's models for code completion and chat.
+[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models specialized in various tasks, including code generation and natural language processing. Their models are known for high performance and efficiency in both code completion and chat interactions.
 
-To connect Tabby with Mistral's models, you need to apply the following configurations in the `~/.tabby/config.toml` file:
+## Chat model
 
-```toml title="~/.tabby/config.toml"
-# Completion Model
-[model.completion.http]
-kind = "mistral/completion"
-model_name = "codestral-latest"
-api_endpoint = "https://api.mistral.ai"
-api_key = "secret-api-key"
+Mistral provides a specialized chat API interface.
 
-# Chat Model
+```toml title="~/.tabby/config.toml"
 [model.chat.http]
 kind = "mistral/chat"
 model_name = "codestral-latest"
 api_endpoint = "https://api.mistral.ai/v1"
-api_key = "secret-api-key"
+api_key = "your-api-key"
 ```
+
+## Completion model
+
+Mistral offers a dedicated completion API interface for code completion tasks.
+
+```toml title="~/.tabby/config.toml"
+[model.completion.http]
+kind = "mistral/completion"
+model_name = "codestral-latest"
+api_endpoint = "https://api.mistral.ai"
+api_key = "your-api-key"
+```
+
+## Embeddings model
+
+Mistral currently does not provide embedding model APIs.
diff --git a/website/docs/references/models-http-api/ollama.md b/website/docs/references/models-http-api/ollama.md
@@ -1,24 +1,35 @@
 # Ollama
 
-[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience.
+[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience. It provides support for various models through HTTP APIs, including completion, chat, and embedding functionalities.
 
-Tabby supports the ollama HTTP API for completion, chat, and embedding models.
+## Chat model
+
+Ollama provides an OpenAI-compatible chat API interface.
+
+```toml title="~/.tabby/config.toml"
+[model.chat.http]
+kind = "openai/chat"
+model_name = "mistral:7b"
+api_endpoint = "http://localhost:11434/v1"
+```
+
+## Completion model
+
+Ollama offers a specialized completion API interface for code completion tasks.
 
 ```toml title="~/.tabby/config.toml"
-# Completion model
 [model.completion.http]
 kind = "ollama/completion"
 model_name = "codellama:7b"
 api_endpoint = "http://localhost:11434"
 prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series.
+```
 
-# Chat model
-[model.chat.http]
-kind = "openai/chat"
-model_name = "mistral:7b"
-api_endpoint = "http://localhost:11434/v1"
+## Embeddings model
 
-# Embedding model
+Ollama provides embedding functionality through its HTTP API.
+
+```toml title="~/.tabby/config.toml"
 [model.embedding.http]
 kind = "ollama/embedding"
 model_name = "nomic-embed-text"

diff --git a/website/docs/references/models-http-api/openai.md b/website/docs/references/models-http-api/openai.md
@@ -1,22 +1,17 @@
 # OpenAI
 
-OpenAI is a leading AI company that has developed an extensive range of language models.
-Tabby supports OpenAI's API specifications for chat, completion, and embedding tasks.
-
-The OpenAI API is widely used and is also provided by other vendors,
-such as vLLM, Nvidia NIM, and LocalAI.
-
-Tabby continues to support the OpenAI Completion API specifications due to its widespread usage.
+OpenAI is a leading AI company that has developed an extensive range of language models. Their API specifications have become a de facto standard, also implemented by other vendors such as vLLM, Nvidia NIM, and LocalAI.
 
 ## Chat model
 
+OpenAI provides a comprehensive chat API interface. Note: Do not append the `/chat/completions` suffix to the API endpoint.
+
 ```toml title="~/.tabby/config.toml"
-# Chat model
 [model.chat.http]
 kind = "openai/chat"
 model_name = "gpt-4o"  # Please make sure to use a chat model, such as gpt-4o
 api_endpoint = "https://api.openai.com/v1"   # DO NOT append the `/chat/completions` suffix
-api_key = "secret-api-key"
+api_key = "your-api-key"
 ```
 
 ## Completion model
@@ -25,11 +20,12 @@ OpenAI doesn't offer models for completions (FIM), its `/v1/completions` API has
 
 ## Embeddings model
 
+OpenAI provides powerful embedding models through their API interface. Note: Do not append the `/embeddings` suffix to the API endpoint.
+
 ```toml title="~/.tabby/config.toml"
-# Embedding model
 [model.embedding.http]
 kind = "openai/embedding"
-model_name = "text-embedding-3-small"   # Please make sure to use a embedding model, such as text-embedding-3-small
+model_name = "text-embedding-3-small"  # Please make sure to use a embedding model, such as text-embedding-3-small
 api_endpoint = "https://api.openai.com/v1"  # DO NOT append the `/embeddings` suffix
-api_key = "secret-api-key"
+api_key = "your-api-key"
 ```
diff --git a/website/docs/references/models-http-api/vllm.md b/website/docs/references/models-http-api/vllm.md
@@ -1,44 +1,48 @@
 # vLLM
 
-[vLLM](https://docs.vllm.ai/en/stable/) is a fast and user-friendly library for LLM inference and serving.
+[vLLM](https://docs.vllm.ai/en/stable/) is a fast and user-friendly library for LLM inference and serving. It provides an OpenAI-compatible server interface, allowing the use of OpenAI kinds for chat and embedding, while offering a specialized interface for completions.
 
-vLLM offers an `OpenAI Compatible Server`, enabling us to use the OpenAI kinds for chat and embedding.
-However, for completion, there are certain differences in the implementation.
-Therefore, we should use the `vllm/completion` kind and provide a `prompt_template` depending on the specific models.
+Important requirements for all model types:
 
-Please note that models differ in their capabilities for completion or chat.
-You should confirm the model's capability before employing it for chat or completion tasks.
+- `model_name` must exactly match the one used to run vLLM
+- `api_endpoint` should follow the format `http://host:port/v1`
+- `api_key` should be identical to the one used to run vLLM
 
-Additionally, there are models that can serve both as chat and completion.
-For detailed information, please refer to the [Model Registry](../../models/index.mdx).
+Please note that models differ in their capabilities for completion or chat. Some models can serve both purposes. For detailed information, please refer to the [Model Registry](../../models/index.mdx).
 
-Below is an example of the vLLM running at `http://localhost:8000`:
+## Chat model
 
-Please note the following requirements in each model type:
-1. `model_name` must exactly match the one used to run vLLM.
-2. `api_endpoint` should follow the format `http://host:port/v1`.
-3. `api_key` should be identical to the one used to run vLLM.
+vLLM provides an OpenAI-compatible chat API interface.
 
 ```toml title="~/.tabby/config.toml"
-# Chat model
 [model.chat.http]
 kind = "openai/chat"
-model_name = "your_model"   # Please make sure to use a chat model.
+model_name = "your_model"   # Please make sure to use a chat model
 api_endpoint = "http://localhost:8000/v1"
-api_key = "secret-api-key"
+api_key = "your-api-key"
+```
+
+## Completion model
 
-# Completion model
+Due to implementation differences, vLLM uses its own completion API interface that requires a specific prompt template based on the model being used.
+
+```toml title="~/.tabby/config.toml"
 [model.completion.http]
 kind = "vllm/completion"
-model_name = "your_model"  # Please make sure to use a completion model.
+model_name = "your_model"  # Please make sure to use a completion model
 api_endpoint = "http://localhost:8000/v1"
-api_key = "secret-api-key"
-prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series.
+api_key = "your-api-key"
+prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series
+```
+
+## Embeddings model
 
-# Embedding model
+vLLM provides an OpenAI-compatible embeddings API interface.
+
+```toml title="~/.tabby/config.toml"
 [model.embedding.http]
 kind = "openai/embedding"
 model_name = "your_model"
 api_endpoint = "http://localhost:8000/v1"
-api_key = "secret-api-key"
+api_key = "your-api-key"
 ```
diff --git a/website/docs/references/models-http-api/voyage-ai.md b/website/docs/references/models-http-api/voyage-ai.md
@@ -2,11 +2,13 @@
 
 [Voyage AI](https://voyage.ai/) is a company that provides a range of embedding models. Tabby supports Voyage AI's models for embedding tasks.
 
-Below is an example configuration:
+## Embeddings model
+
+Voyage AI provides specialized embedding models through their API interface.
 
 ```toml title="~/.tabby/config.toml"
 [model.embedding.http]
 kind = "voyage/embedding"
-api_key = "..."
 model_name = "voyage-code-2"
-```
+api_key = "your-api-key"
+```