give GPT access to project files, docs and websites
It is acomplished with the retrieval plugin, which allows models to perform semantic searches against a vector database (chroma).
The seek.py
script will scrape web and local content, store it in a vector db, then use it with openai models to answer queries.
Initially, it reads a list of URLs from a file, scrapes the web content from these URLs, and stores text. Then, it creates an index from the stored data and other files in the project directory, and uses this index to power a question-answering system, to answer questions based on the indexed data.
- Web Scraping
- Data Storage
- Sqlite Database (for history, with custom loader)
- Recursive Document Finding and Loading
- Index Creation
- Retrieval-Based Question Answering
- Interactive Chat Interface
- Command Line Arguments
- Python version 3.10.12 or higher
- Install the required Python packages:
pip install -r requirements.txt
-
Add your OpenAI API key to constants.py.
-
Create a symbolic to your project directory:
ln -s ~/path/to/your-project project
- Add any URLs for documentation that should be scraped to the urls.txt file (one URL per line).
This command builds a new vector store each time it's run:
python seek.py -q "Summarise what my web application does in 3 sentences."
This command saves the vector store locally for faster subsequent queries:
python seek.py -q "Summarise what my web application does in 3 sentences." --persist
Defaults to False
if not specified.
python seek.py -q ".." -m gpt-4
or
python seek.py -q ".." -m gpt-3.5-turbo
Defaults to gpt-3.5-turbo
if not specified.
python seek.py -q ".." -k 10
Note that high values can lead to long runtimes and there are context length limits for GPT.
Defaults to 5
if not specified.
To deactivate history, use the --no-history
flag:
python seek.py -q ".." --no-history
History is enabled by default.
It is stored in a local sqlite database.
du -sh persist
The generateMetadata.py script can be used to read project files and generate a summary and tags for them using GPT.
generateMetadata.py
docker build -t code-seek .
docker run -it --rm code-seek "Summarise what my web application does in 3 sentences."
Be aware of data usage policies when using this tool. Use for fun/hobby projects only. See
OpenAI API data usage policies
... OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose...