- Unstructured-IO/unstructured
- mindee/doctr - docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
- microsoft/markitdown - Python tool for converting files and office documents to Markdown.
- DS4SD/docling - Get your documents ready for gen AI
- adithya-s-k/omniparse - Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
- CosmosShadow/gptpdf - Using GPT to parse PDF
- QuivrHQ/quivr: Your GenAI Second Brain 🧠 A personal productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Local & Private alternative to OpenAI GPTs & ChatGPT powered by retrieval-augmented generation.
- Cinnamon/kotaemon: An open-source RAG-based tool for chatting with your documents.
- Pix2Text V1.1 新版发布,支持 PDF 转 Markdown | Breezedeus.com
- 使用视觉语言模型进行 PDF 检索 [译] | 宝玉的分享
- PDF智能解析:RAG策略下的技术架构与实现
- LLM之RAG实战(二十九)| 探索RAG PDF解析
- Using LlamaParse for Knowledge Graph Creation from Documents | by Fanghua (Joshua) Yu | Apr, 2024 | Medium
- Multi-document Agentic RAG using Llama-Index and Mistral | by Plaban Nayak | The AI Forum | May, 2024 | Medium
- Building a Multi-Document ReAct Agent for Financial Analysis using LlamaIndex and Qdrant | by M K Pavan Kumar | Jun, 2024 | Stackademic
- RAG + LlamaParse: Advanced PDF Parsing for Retrieval | by Ryan Siegler | KX Systems | May, 2024 | Medium