A web application that converts Gitbook documentation into markdown format, optimized for use with Large Language Models (LLMs) like ChatGPT, Claude, and LLaMA. Checkout docingest for a hosted version of this with support for multiple other documentation providers like readthedocs, mintlify, docusaurus, etc
- Download technical documentation for training custom LLMs
- Create knowledge bases for ChatGPT, Claude, and other AI assistants
- Feed documentation into context windows of AI chatbots
- Generate markdown files optimized for LLM processing
- Scrape Gitbook documentation sites
- Convert HTML content to LLM-friendly markdown format
- View converted content in browser
- Download documentation as a single markdown file
- Handles internal links and navigation
- Preserves document structure
- Clone this repository
- Install dependencies:
pip install -r requirements.txt
- Start the web server:
python app.py
-
Open your browser and navigate to
http://localhost:5000
-
Enter the URL of a Gitbook documentation site
-
Choose to either:
- View the converted content in your browser
- Download the content as a markdown file
-
Use the downloaded markdown with:
- ChatGPT (paste into conversation)
- Claude (upload as a file)
- Custom LLaMA models (include in training data)
- Any other LLM that accepts markdown input
The application uses:
- Flask for the web interface
- BeautifulSoup4 for HTML parsing
- Requests for fetching web content
- Python-slugify for URL/filename handling
This tool is designed specifically for Gitbook-based documentation sites and optimized for LLM consumption. It may not work correctly with other documentation platforms.