Skip to content

Commit

Permalink
add more content
Browse files Browse the repository at this point in the history
  • Loading branch information
jerpint committed Nov 30, 2023
1 parent 6806133 commit 24470f9
Show file tree
Hide file tree
Showing 3 changed files with 128 additions and 7 deletions.
12 changes: 7 additions & 5 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@ We scraped the documentation of `huggingface 🤗 Transformers <https://huggingf
:maxdepth: 2

usage/installation


.. toctree::
:caption: Customization
:maxdepth: 1

usage/components
usage/configuration
usage/custom_docs

Expand All @@ -32,11 +39,6 @@ We scraped the documentation of `huggingface 🤗 Transformers <https://huggingf

usage/components_overview

.. toctree::
:maxdepth: 2
:caption: API Reference

autoapi/index

Useful links
============
Expand Down
19 changes: 19 additions & 0 deletions docs/usage/components.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Buster Components

Buster is built around components that can be customized and extended.

For example, to do chat completion, we must use a `Completer` component.
While we've implemented some completers like `ChatGPT`, adding more completers is possible by inheriting from the `Completer` base class.

Currently, buster implements the following components:

* `Completer`: The language model responsible for generating a response
* `Retriever`: Responsible for fetching the documents associated to a user's input
* `DocumentsFormatter`: Responsible for taking the various documents and formatting them in different ways. We support formatting documents into json-like objects and html-like objects.
* `PromptFormatter`: Responsible for combining the formatted documents with the prompts for the LLM
* `Validator`: Responsible for validating user inputs and/or model outputs. This can be implemented via checks of the questions and answer before and after completions occur.
* `Tokenizer`: Used to monitor the length of prompts and completions. It is generally assumed that the `Tokenizer` is associated to that of the `Completer`.


Additional components are also available for managing documents:
* `DocumentManager`: Manager allowing to generate and store embeddings (should be used in conjunction with `Retriever` components)
104 changes: 102 additions & 2 deletions docs/usage/configuration.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,103 @@
# Configuration
# Configuration of Components

Buster uses a config file to setup most of the app.
Buster's internal configuration is controlled via the `BusterConfig` object.
It is meant to set all of the different parameters for the different components in one place.

Here is a typical setup:

```python
from buster.busterbot import BusterConfig

buster_cfg = BusterConfig(
retriever_cfg={
"path": "deeplake_store",
"top_k": 3,
"thresh": 0.7,
"max_tokens": 2000,
"embedding_model": "text-embedding-ada-002",
},
validator_cfg={
"unknown_response_templates": [
"I'm sorry, but I am an AI language model trained to assist with questions related to AI. I cannot answer that question as it is not relevant to the library or its usage. Is there anything else I can assist you with?",
],
"unknown_threshold": 0.85,
"embedding_model": "text-embedding-ada-002",
"use_reranking": True,
"invalid_question_response": "This question does not seem relevant to my current knowledge.",
"check_question_prompt": """You are an chatbot answering questions on artificial intelligence.
A user will submit a question. Respond 'true' if it is valid, respond 'false' if it is invalid.""",
"completion_kwargs": {
"model": "gpt-3.5-turbo",
"stream": False,
"temperature": 0,
},
},
documents_answerer_cfg={
"no_documents_message": "No documents are available for this question.",
},
completion_cfg={
"completion_kwargs": {
"model": "gpt-3.5-turbo",
"stream": False,
"temperature": 0,
},
},
tokenizer_cfg={
"model_name": "gpt-3.5-turbo",
},
documents_formatter_cfg={
"max_tokens": 3500,
"columns": ["content", "title", "source"],
},
prompt_formatter_cfg={
"max_tokens": 3500,
"text_before_docs": (
"You are a chatbot assistant answering technical questions about artificial intelligence (AI)."
"You can only respond to a question if the content necessary to answer the question is contained in the following provided documentation. "
"If the answer is in the documentation, summarize it in a helpful way to the user. "
),
"text_after_docs": (
"REMEMBER:\n"
"You are a chatbot assistant answering technical questions about artificial intelligence (AI)."
"Here are the rules you must follow:\n"
"1) You must only respond with information contained in the documentation above. Say you do not know if the information is not provided.\n"
"2) Make sure to format your answers in Markdown format, including code block and snippets.\n"
"Now answer the following question:\n"
),
},
)
```

This `BusterConfig` can then be passed to initialize Buster and all of its components:

```python
from buster.busterbot import Buster, BusterConfig
from buster.completers import ChatGPTCompleter, DocumentAnswerer
from buster.formatters.documents import DocumentsFormatterJSON
from buster.formatters.prompts import PromptFormatter
from buster.retriever import DeepLakeRetriever, Retriever
from buster.tokenizers import GPTTokenizer
from buster.validators import QuestionAnswerValidator, Validator

def setup_buster(buster_cfg: BusterConfig):
"""initialize buster with a buster_cfg class"""
retriever: Retriever = DeepLakeRetriever(**buster_cfg.retriever_cfg)
tokenizer = GPTTokenizer(**buster_cfg.tokenizer_cfg)
document_answerer: DocumentAnswerer = DocumentAnswerer(
completer=ChatGPTCompleter(**buster_cfg.completion_cfg),
documents_formatter=DocumentsFormatterJSON(tokenizer=tokenizer, **buster_cfg.documents_formatter_cfg),
prompt_formatter=PromptFormatter(tokenizer=tokenizer, **buster_cfg.prompt_formatter_cfg),
**buster_cfg.documents_answerer_cfg,
)
validator: Validator = QuestionAnswerValidator(**buster_cfg.validator_cfg)
buster: Buster = Buster(retriever=retriever, document_answerer=document_answerer, validator=validator)
return buster

buster = setup_buster(buster_cfg)

completion = buster.process_input("What is backpropagation?")
print(completion)
```

uses a config file to setup most of the app.

0 comments on commit 24470f9

Please sign in to comment.