PineconeUtils is a Python module designed to handle and process data for embedding and indexing using Pinecone, Cohere, and OpenAI services. This utility module makes it easy to load, chunk, prepare, and upsert data into a Pinecone index, making it ideal for applications involving text embedding and retrieval systems(RAG).
- Load text data from
.txt
,.docx
, and.pdf
files. - Chunk text data for processing.
- Prepare embeddings using either Cohere or OpenAI models.
- Upsert prepared data into a Pinecone index.
To install PineconeUtils, you can use pip:
pip install pineconeutils
Here's a quick example of how to use PineconeUtils:
First, ensure you have the necessary API keys and setup information:
pinecone_api_key = "your_pinecone_api_key"
cohere_api_key = "your_cohere_api_key"
openai_api_key = "your_openai_api_key"
index_name = "your_index_name"
namespace_id = "your_namespace_id"
Load data from a supported file format:
from pineconeutils import PineconeUtils
# Create instance of PineconeUtils
pinecone = PineconeUtils(pinecone_api_key=pinecone_api_key, openai_api_key=openai_api_key,cohere_api_key =cohere_api_key, index_name=index_name, namespace_id=namespace_id)
path = "path_to_your_file.docx"
data = pinecone.load_data(path)
print("Loaded Data:", data)
chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)
prepared_data = pinecone.prepare_data(chunks, model="text-embedding-ada-002", service="openai")
chunks = pinecone.chunk_data(data, chunk_size=100, chunk_overlap=10)
print("Data Chunks:", chunks)
prepared_data = pinecone.prepare_data(chunks, model="embed-english-v3.0", service="cohere",input_type="search_document")
For more about Cohere Embeddings: Cohere Embeddings
successful = pinecone.upsert_data(prepared_data)
print("Data upsertion was", "successful" if successful else "unsuccessful")
To contribute to the development of PineconeUtils, you can clone the repository and submit pull requests.
If you encounter any issues or have questions, please file an issue on the GitHub repository.
This project is licensed under the MIT License - see the LICENSE file for details.