A Python-based tool to download audio from YouTube videos, transcribe the audio using Faster Whisper, and generate concise summaries with a locally hosted LLaMA language model.
- Features
- Prerequisites
- Installation
- Setting Up llama.cpp Server
- Downloading the Language Model
- Usage
- Example
- Troubleshooting
- Contributing
- License
- Download Audio: Extracts audio from YouTube videos in MP3 format.
- Transcription: Utilizes Faster Whisper with GPU acceleration for efficient and accurate transcription.
- Summarization: Generates concise summaries using a locally hosted LLaMA language model.
- Token Counting: Provides the number of tokens in the transcription for API usage management.
- User-Friendly Output: Displays summaries in Markdown format using the rich library for enhanced readability.
Before using this tool, ensure you have the following installed on your system:
-
Python 3.8+: Ensure you have Python installed. You can download it from python.org.
-
FFmpeg: Required for audio processing.
- Ubuntu/Debian:
sudo apt update sudo apt install ffmpeg
- macOS (using Homebrew):
brew install ffmpeg
- Windows: Download the latest FFmpeg build from FFmpeg Downloads. Follow the installation instructions for your system.
-
CUDA: If you have an NVIDIA GPU and wish to utilize GPU acceleration for Faster Whisper, ensure CUDA is installed and properly configured. Refer to the CUDA Installation Guide for details.
-
Git: To clone repositories.
- Ubuntu/Debian:
sudo apt install git
- macOS (using Homebrew):
brew install git
- Windows: Download and install Git from git-scm.com.
git clone https://github.com/nmandic78/yt_summary.git
cd yt_summary
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Ensure you have pip updated:
pip install --upgrade pip
Install the required packages:
pip install -r requirements.txt
or:
pip install yt-dlp faster-whisper openai tiktoken rich
To generate summaries, the tool relies on a locally hosted LLaMA language model using llama.cpp. Follow the steps below to set up the server.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Ensure you have the necessary build tools installed (e.g., make, gcc).
make
This will compile the llama-server executable.
- Visit the Hugging Face Models page.
- Search for a compatible LLaMA model, such as gemma-2-9b-it-Q8_0.gguf.
- Download the model and place it in a directory of your choice, e.g., /mnt/disk2/LLM_MODELS/models/gemma-2-9b-it-Q8_0.gguf.
Note: Ensure you have the rights and necessary permissions to use the model.
Execute the server with your chosen model:
./llama-server -m /mnt/disk2/LLM_MODELS/models/gemma-2-9b-it-Q8_0.gguf -ngl 99 -c 8192
Parameters Explained:
-m
: Path to the model file.-ngl
: Number of GPU layers (adjust based on your GPU capabilities).-c
: Context size in tokens (adjust as needed).
The server will start and listen on http://localhost:8080/v1.
If you haven't downloaded a LLaMA model yet, follow the steps in the Setting Up llama.cpp Server section to obtain a compatible model from Hugging Face.
Once you have set up the llama.cpp server and installed all dependencies, you can use the transcription and summarization tool.
-v, --video_url
: (Required) YouTube video URL to download and transcribe.-m, --mp3_dir
: (Optional) Directory to save the downloaded MP3 file. Default: /home/yourusername/Music/YT_AUDIOS/-t, --transcript_dir
: (Optional) Directory to save the transcription text file. Default: /home/yourusername/Music/YT_AUDIOS/
python yt_summary.py -v <YouTube_Video_URL> [options]
python yt_summary.py -v https://www.youtube.com/watch?v=dQw4w9WgXcQ
This command will:
- Download the audio from the provided YouTube video URL and save it as an MP3 file in the default directory.
- Transcribe the audio using Faster Whisper.
- Generate a summary using the locally hosted LLaMA model.
- Display the summary in the console and save the transcription to a text file.
You can specify custom directories for saving MP3 files and transcriptions:
python yt_summary.py -v <YouTube_Video_URL> -m /path/to/mp3_dir -t /path/to/transcript_dir
- FFmpeg Not Found: Ensure FFmpeg is installed and added to your system's PATH.
- CUDA Issues: Verify that CUDA is correctly installed and that your GPU supports the required operations.
- llama.cpp Server Not Running: Ensure the server is running before executing the transcription script. Verify the server URL and port.
- Missing Dependencies: Ensure all Python packages are installed. Re-run
pip install -r requirements.txt
if necessary. - Insufficient Permissions: Check directory permissions for saving MP3 and transcription files.
Contributions are welcome! Please follow these steps:
git checkout -b feature/YourFeature
git commit -m "Add YourFeature"
git push origin feature/YourFeature
Please ensure your code follows the project's coding standards and includes appropriate documentation.
This project is licensed under the MIT License.
Developed by Nenad Mandic