Skip to content

Commit

Permalink
docs(api): update model descriptions for previous vendors (#3747)
Browse files Browse the repository at this point in the history
* docs(api): update model descriptions and configuration examples for various AI providers

* docs(api): clarify API endpoint usage and model specifications in documentation
  • Loading branch information
Sma1lboy authored Jan 23, 2025
1 parent 56c25c1 commit 5b01a28
Show file tree
Hide file tree
Showing 9 changed files with 145 additions and 97 deletions.
24 changes: 16 additions & 8 deletions website/docs/references/models-http-api/deepseek.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,31 @@
# DeepSeek

[DeepSeek](https://www.deepseek.com/) offers a suite of AI models, such as [DeepSeek V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) and [DeepSeek Coder](https://huggingface.co/collections/deepseek-ai/deepseekcoder-v2-666bf4b274a5f556827ceeca), which perform well in coding tasks. Tabby supports DeepSeek's models for both code completion and chat.
[DeepSeek](https://www.deepseek.com/) is an AI company that develops large language models specialized in coding and general tasks. Their models include [DeepSeek V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) for general tasks and [DeepSeek Coder](https://huggingface.co/collections/deepseek-ai/deepseekcoder-v2-666bf4b274a5f556827ceeca) specifically optimized for programming tasks.

Below is an example
## Chat model

DeepSeek provides an OpenAI-compatible chat API interface.

```toml title="~/.tabby/config.toml"
# Chat model configuration
[model.chat.http]
# Deepseek's chat interface is compatible with OpenAI's chat API.
kind = "openai/chat"
model_name = "your_model"
api_endpoint = "https://api.deepseek.com/v1"
api_key = "secret-api-key"
api_key = "your-api-key"
```

## Completion model

DeepSeek offers a specialized completion API interface for code completion tasks.

# Completion model configuration
```toml title="~/.tabby/config.toml"
[model.completion.http]
# Deepseek uses its own completion API interface.
kind = "deepseek/completion"
model_name = "your_model"
api_endpoint = "https://api.deepseek.com/beta"
api_key = "secret-api-key"
api_key = "your-api-key"
```

## Embeddings model

DeepSeek currently does not provide embedding model APIs.
21 changes: 12 additions & 9 deletions website/docs/references/models-http-api/jan.ai.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,23 @@
# Jan AI

[Jan](https://jan.ai/) is an open-source alternative to ChatGPT that runs entirely offline on your computer.
[Jan](https://jan.ai/) is an open-source alternative to ChatGPT that runs entirely offline on your computer. It provides an OpenAI-compatible server interface that can be enabled through the Jan App's `Local API Server` UI.

Jan can run a server that provides an OpenAI-equivalent chat API at https://localhost:1337,
allowing us to use the OpenAI kinds for chat.
To use the Jan Server, you need to enable it in the Jan App's `Local API Server` UI.
## Chat model

However, Jan does not yet provide API support for completion and embeddings.

Below is an example for chat:
Jan provides an OpenAI-compatible chat API interface.

```toml title="~/.tabby/config.toml"
# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "your_model"
api_endpoint = "http://localhost:1337/v1"
api_key = ""
```
```

## Completion model

Jan currently does not provide completion API support.

## Embeddings model

Jan currently does not provide embedding API support.
27 changes: 19 additions & 8 deletions website/docs/references/models-http-api/llama.cpp.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,33 @@
# llama.cpp

[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models.
[llama.cpp](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md#api-endpoints) is a popular C++ library for serving gguf-based models. It provides a server implementation that supports completion, chat, and embedding functionalities through HTTP APIs.

Tabby supports the llama.cpp HTTP API for completion, chat, and embedding models.
## Chat model

llama.cpp provides an OpenAI-compatible chat API interface.

```toml title="~/.tabby/config.toml"
[model.chat.http]
kind = "openai/chat"
api_endpoint = "http://localhost:8888"
```

## Completion model

llama.cpp offers a specialized completion API interface for code completion tasks.

```toml title="~/.tabby/config.toml"
# Completion model
[model.completion.http]
kind = "llama.cpp/completion"
api_endpoint = "http://localhost:8888"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series.
```

# Chat model
[model.chat.http]
kind = "openai/chat"
api_endpoint = "http://localhost:8888"
## Embeddings model

llama.cpp provides embedding functionality through its HTTP API.

# Embedding model
```toml title="~/.tabby/config.toml"
[model.embedding.http]
kind = "llama.cpp/embedding"
api_endpoint = "http://localhost:8888"
Expand Down
33 changes: 18 additions & 15 deletions website/docs/references/models-http-api/llamafile.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,41 @@
# llamafile

[llamafile](https://github.com/Mozilla-Ocho/llamafile)
is a Mozilla Builders project that allows you to distribute and run LLMs with a single file.
[llamafile](https://github.com/Mozilla-Ocho/llamafile) is a Mozilla Builders project that allows you to distribute and run LLMs with a single file. It embeds a llama.cpp server and provides an OpenAI API-compatible chat-completions endpoint, allowing us to use the `openai/chat`, `llama.cpp/completion`, and `llama.cpp/embedding` types.

llamafile embeds a llama.cpp server and provides an OpenAI API-compatible chat-completions endpoint,
allowing us to use the `openai/chat`, `llama.cpp/completion`, and `llama.cpp/embedding` types.
By default, llamafile uses port `8080`, which conflicts with Tabby's default port. It is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`. For embeddings functionality, you need to run llamafile with both the `--embedding` and `--port` options.

By default, llamafile uses port `8080`, which is also used by Tabby.
Therefore, it is recommended to run llamafile with the `--port` option to serve on a different port, such as `8081`.
## Chat model

For embeddings, the embedding endpoint is no longer supported in the standard llamafile server,
so you need to run llamafile with the `--embedding` and `--port` options.

Below is an example configuration:
llamafile provides an OpenAI-compatible chat API interface. Note that the endpoint URL must include the `v1` suffix.

```toml title="~/.tabby/config.toml"
# Chat model
[model.chat.http]
kind = "openai/chat" # llamafile uses openai/chat kind
model_name = "your_model"
api_endpoint = "http://localhost:8081/v1" # Please add and conclude with the `v1` suffix
api_key = ""
```

## Completion model

# Completion model
llamafile uses llama.cpp's completion API interface. Note that the endpoint URL should NOT include the `v1` suffix.

```toml title="~/.tabby/config.toml"
[model.completion.http]
kind = "llama.cpp/completion" # llamafile uses llama.cpp/completion kind
kind = "llama.cpp/completion"
model_name = "your_model"
api_endpoint = "http://localhost:8081" # DO NOT append the `v1` suffix
api_key = "secret-api-key"
prompt_template = "<|fim_prefix|>{prefix}<|fim_suffix|>{suffix}<|fim_middle|>" # Example prompt template for the Qwen2.5 Coder model series.
```

## Embeddings model

# Embedding model
llamafile provides embedding functionality through llama.cpp's API interface. Note that the endpoint URL should NOT include the `v1` suffix.

```toml title="~/.tabby/config.toml"
[model.embedding.http]
kind = "llama.cpp/embedding" # llamafile uses llama.cpp/embedding kind
kind = "llama.cpp/embedding"
model_name = "your_model"
api_endpoint = "http://localhost:8082" # DO NOT append the `v1` suffix
api_key = ""
Expand Down
32 changes: 21 additions & 11 deletions website/docs/references/models-http-api/mistral-ai.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,31 @@
# Mistral AI

[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models. Tabby supports Mistral's models for code completion and chat.
[Mistral](https://mistral.ai/) is a platform that provides a suite of AI models specialized in various tasks, including code generation and natural language processing. Their models are known for high performance and efficiency in both code completion and chat interactions.

To connect Tabby with Mistral's models, you need to apply the following configurations in the `~/.tabby/config.toml` file:
## Chat model

```toml title="~/.tabby/config.toml"
# Completion Model
[model.completion.http]
kind = "mistral/completion"
model_name = "codestral-latest"
api_endpoint = "https://api.mistral.ai"
api_key = "secret-api-key"
Mistral provides a specialized chat API interface.

# Chat Model
```toml title="~/.tabby/config.toml"
[model.chat.http]
kind = "mistral/chat"
model_name = "codestral-latest"
api_endpoint = "https://api.mistral.ai/v1"
api_key = "secret-api-key"
api_key = "your-api-key"
```

## Completion model

Mistral offers a dedicated completion API interface for code completion tasks.

```toml title="~/.tabby/config.toml"
[model.completion.http]
kind = "mistral/completion"
model_name = "codestral-latest"
api_endpoint = "https://api.mistral.ai"
api_key = "your-api-key"
```

## Embeddings model

Mistral currently does not provide embedding model APIs.
29 changes: 20 additions & 9 deletions website/docs/references/models-http-api/ollama.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,35 @@
# Ollama

[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience.
[ollama](https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion) is a popular model provider that offers a local-first experience. It provides support for various models through HTTP APIs, including completion, chat, and embedding functionalities.

Tabby supports the ollama HTTP API for completion, chat, and embedding models.
## Chat model

Ollama provides an OpenAI-compatible chat API interface.

```toml title="~/.tabby/config.toml"
[model.chat.http]
kind = "openai/chat"
model_name = "mistral:7b"
api_endpoint = "http://localhost:11434/v1"
```

## Completion model

Ollama offers a specialized completion API interface for code completion tasks.

```toml title="~/.tabby/config.toml"
# Completion model
[model.completion.http]
kind = "ollama/completion"
model_name = "codellama:7b"
api_endpoint = "http://localhost:11434"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series.
```

# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "mistral:7b"
api_endpoint = "http://localhost:11434/v1"
## Embeddings model

# Embedding model
Ollama provides embedding functionality through its HTTP API.

```toml title="~/.tabby/config.toml"
[model.embedding.http]
kind = "ollama/embedding"
model_name = "nomic-embed-text"
Expand Down
20 changes: 8 additions & 12 deletions website/docs/references/models-http-api/openai.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@
# OpenAI

OpenAI is a leading AI company that has developed an extensive range of language models.
Tabby supports OpenAI's API specifications for chat, completion, and embedding tasks.

The OpenAI API is widely used and is also provided by other vendors,
such as vLLM, Nvidia NIM, and LocalAI.

Tabby continues to support the OpenAI Completion API specifications due to its widespread usage.
OpenAI is a leading AI company that has developed an extensive range of language models. Their API specifications have become a de facto standard, also implemented by other vendors such as vLLM, Nvidia NIM, and LocalAI.

## Chat model

OpenAI provides a comprehensive chat API interface. Note: Do not append the `/chat/completions` suffix to the API endpoint.

```toml title="~/.tabby/config.toml"
# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "gpt-4o" # Please make sure to use a chat model, such as gpt-4o
api_endpoint = "https://api.openai.com/v1" # DO NOT append the `/chat/completions` suffix
api_key = "secret-api-key"
api_key = "your-api-key"
```

## Completion model
Expand All @@ -25,11 +20,12 @@ OpenAI doesn't offer models for completions (FIM), its `/v1/completions` API has

## Embeddings model

OpenAI provides powerful embedding models through their API interface. Note: Do not append the `/embeddings` suffix to the API endpoint.

```toml title="~/.tabby/config.toml"
# Embedding model
[model.embedding.http]
kind = "openai/embedding"
model_name = "text-embedding-3-small" # Please make sure to use a embedding model, such as text-embedding-3-small
model_name = "text-embedding-3-small" # Please make sure to use a embedding model, such as text-embedding-3-small
api_endpoint = "https://api.openai.com/v1" # DO NOT append the `/embeddings` suffix
api_key = "secret-api-key"
api_key = "your-api-key"
```
48 changes: 26 additions & 22 deletions website/docs/references/models-http-api/vllm.md
Original file line number Diff line number Diff line change
@@ -1,44 +1,48 @@
# vLLM

[vLLM](https://docs.vllm.ai/en/stable/) is a fast and user-friendly library for LLM inference and serving.
[vLLM](https://docs.vllm.ai/en/stable/) is a fast and user-friendly library for LLM inference and serving. It provides an OpenAI-compatible server interface, allowing the use of OpenAI kinds for chat and embedding, while offering a specialized interface for completions.

vLLM offers an `OpenAI Compatible Server`, enabling us to use the OpenAI kinds for chat and embedding.
However, for completion, there are certain differences in the implementation.
Therefore, we should use the `vllm/completion` kind and provide a `prompt_template` depending on the specific models.
Important requirements for all model types:

Please note that models differ in their capabilities for completion or chat.
You should confirm the model's capability before employing it for chat or completion tasks.
- `model_name` must exactly match the one used to run vLLM
- `api_endpoint` should follow the format `http://host:port/v1`
- `api_key` should be identical to the one used to run vLLM

Additionally, there are models that can serve both as chat and completion.
For detailed information, please refer to the [Model Registry](../../models/index.mdx).
Please note that models differ in their capabilities for completion or chat. Some models can serve both purposes. For detailed information, please refer to the [Model Registry](../../models/index.mdx).

Below is an example of the vLLM running at `http://localhost:8000`:
## Chat model

Please note the following requirements in each model type:
1. `model_name` must exactly match the one used to run vLLM.
2. `api_endpoint` should follow the format `http://host:port/v1`.
3. `api_key` should be identical to the one used to run vLLM.
vLLM provides an OpenAI-compatible chat API interface.

```toml title="~/.tabby/config.toml"
# Chat model
[model.chat.http]
kind = "openai/chat"
model_name = "your_model" # Please make sure to use a chat model.
model_name = "your_model" # Please make sure to use a chat model
api_endpoint = "http://localhost:8000/v1"
api_key = "secret-api-key"
api_key = "your-api-key"
```

## Completion model

# Completion model
Due to implementation differences, vLLM uses its own completion API interface that requires a specific prompt template based on the model being used.

```toml title="~/.tabby/config.toml"
[model.completion.http]
kind = "vllm/completion"
model_name = "your_model" # Please make sure to use a completion model.
model_name = "your_model" # Please make sure to use a completion model
api_endpoint = "http://localhost:8000/v1"
api_key = "secret-api-key"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series.
api_key = "your-api-key"
prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>" # Example prompt template for the CodeLlama model series
```

## Embeddings model

# Embedding model
vLLM provides an OpenAI-compatible embeddings API interface.

```toml title="~/.tabby/config.toml"
[model.embedding.http]
kind = "openai/embedding"
model_name = "your_model"
api_endpoint = "http://localhost:8000/v1"
api_key = "secret-api-key"
api_key = "your-api-key"
```
8 changes: 5 additions & 3 deletions website/docs/references/models-http-api/voyage-ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

[Voyage AI](https://voyage.ai/) is a company that provides a range of embedding models. Tabby supports Voyage AI's models for embedding tasks.

Below is an example configuration:
## Embeddings model

Voyage AI provides specialized embedding models through their API interface.

```toml title="~/.tabby/config.toml"
[model.embedding.http]
kind = "voyage/embedding"
api_key = "..."
model_name = "voyage-code-2"
```
api_key = "your-api-key"
```

0 comments on commit 5b01a28

Please sign in to comment.