⚡ Implementing state-of-the-art advanced RAG technique: Self Reflective RAG 💪
This project implements a self reflective RAG, seamlessly integrating multiple knowledge sources (website, SQL, PDFs) while meticulously aligning with business requirements.
-
What is self-reflective RAG: A self-reflective RAG refers to an adaptive and self-improving system that combines information retrieval and language generation processes to provide more accurate and context-specific responses. Self-Reflective RAG involves a feedback loop where the model evaluates and reflects on its own outputs. This reflection helps in identifying and correcting errors or improving responses.
-
Project use case: I always wanted to create a RAG system which involves multiple knowledge sources, specifically for a business. The project is built for businesses with integration of data sources included but not limited to unstructured: PDFs and Documents, structured: SQL, NoSQL, Graph databases, CSV and more, semi-structured: websites, APIs, and other platforms along with web searching capabilities.
-
Tech-stack:
Component | Technology | Description |
---|---|---|
RAG | LangGraph | Framework used for building the RAG model |
Output Tracing | LangSmith | Tool used for tracing and evaluating model outputs |
Indexing | Pinecone | Service used for indexing and managing the knowledge base |
Web Searching | Tavily | Tool used for retrieving information from the web |
LLM | OpenAI | Provides the language model for text generation |
Chat Interface | Gradio | Interface for interacting with the RAG model |
SQL Database | SQLite | Database used for querying business data and for storing RAG's memory |
- First the user asks a question.
- The query is analyzed by the router and the RAG is directed to the relevant knowledge source. Available routes: i. Vector store (pdf, website), ii. SQL, iii. Web search, iii. Fallback conversational LLM
- For the vector store route, the
Retriever Node
fetches relevant documents which is fed to theGrader Node
for evaluation (whether it is relevant and useful or not). If the documents are not relevant then theQuery Translation Node
re-writes the question and theRetriever Node
is called again in order to get better documents. - If the documents are relevant and useful, the
Generate Node
is called which generates a response to the asked question. - The generated response is then checked for hallucination using the
Hallucination Grader
. This grader checks whether the response is grounded or not. If not then theGenerate Node
is called again, otherwise next step is taken. - Finally the
Answer Grader Node
is responsible to check whether the generated answer is addressing the question or not. If not then the response generation loop is called again. Otherwise, the response is provided to the user. - For other routes, different tools, agents and chains are developed which are called based on the route. Please refer to the image below to better understanding.
- Fork the repo (or directly clone if you don't want to update with your own code).
- Setup a folder and clone using:
git clone https://github.com/Taha0229/self-reflective-RAG.git .
orgit clone <your/link/to/repo> .
if you have forked.
- Create a virtual environment using:
conda create --name self-reflective-rag python=3.10 -y
. I have used conda, you can also use your preferred tool to create a virtual environment. If this is your first time with virtual environment, then install and setup conda first.
- Setup environment variables: since we are gonna use multiple APIs (OpenAI, LangSmith (optional but recommended), Tavily and Pinecone), it is recommended to use environment variables otherwise you can hard code your API keys. Create a file named as
.env
. Generate and paste API keys as follows:
OPENAI_API_KEY = "<your-openai-api-key>"
TAVILY_API_KEY = "<your-tavily-api-key>"
PINECONE_API_KEY = "<your-pinecone-api-key>"
LANGCHAIN_API_KEY = "<your-langchain-api-key>"
LANGCHAIN_TRACING_V2 = "true"
LANGCHAIN_ENDPOINT= "https://api.smith.langchain.com"
LANGCHAIN_PROJECT= "<your-project-name>"
-
Select a kernel for the jupyter notebook then run all the cells. Or else you can go through each cell and customize as per your needs. I have provided markdown and comments for each and every cell, doc strings are also present for all the classes and function/methods.
-
Cells' structure:
- Setup Environment: 4 cells
- Setup Pinecone Index: 7 cells
- Setup Chains: 18 cells
- Setup Graph: 15 cells
- Setup Chatting Interface: 6 cells
-
Self Reflective RAG is an advanced and state-of-the-art strategy that unites (1) query analysis with (2) active / self-corrective RAG.
-
The implementation is inspired by this paper.
-
The architecture involves following data sources/ Routing:
- URL and pdf for the vector store
- SQL database
- Web Search using
- Fallback conversational LLM
- The Self-Reflection loop includes:
- Grading retrieved documents -> re-retrieve or change the data source if document is not relevant
- Hallucination checker -> re-generates the response if hallucination is found
- Answer checker -> checks whether the generated answer addresses the user query or not otherwise generates again
- Previously explored Advanced RAG techniques and research papers:
i. Query Translation:
ii. Indexing:
iii. Other RAG Architectures:
- CRAG
Feat– feature
Fix– bug fixes
Docs– changes to the documentation like README
Style– style or formatting change
Perf – improves code performance
Test– test a feature
Example: git commit -m "Docs: add readme"
or git commit -m "Feat: add chatting interface"