Skip to content

Latest commit

 

History

History
47 lines (33 loc) · 1.25 KB

README.md

File metadata and controls

47 lines (33 loc) · 1.25 KB

VecTrekker

Overview

VecTrekker is a simple utility to easily walk through a directory of files, and sync them to a vector database (for example, Pinecone). You can use it (for example) to index your notes for use with an LLM chain.

The current tokenizer is cl100k_base and the current embedding model used is text-embedding-ada-002 from OpenAI.

Quick-start guide

pip install vectrekker
vectrekker --dry-run

You can adjust the configuration in ~/.vectrekker/config.toml (created automatically after first startup) to add your credentials for Pinecone, as well as OpenAI.

Scheduling VecTrekker

It's suggested that you setup a crontab for VecTrekker to periodically scan your directories again, and update any files that are out of date. An example crontab scanning every two hours is

mkdir -p ~/.vectrekker
python3.10 -m venv ~/.vectrekker/.venv
~/.vectrekker/.venv/bin/pip install vectrekker
0 * * * * date >> ~/.vectrekker/vectrekker.log && ~/dev/vectrekker/.venv/bin/vectrekker 2>&1 >> ~/.vectrekker/vectrekker.log

Vector database support

These are the currently supported vector databases.

Database Support
Pinecone