Skip to content

YouTube Audio Transcription and Summarization Tool

License

Notifications You must be signed in to change notification settings

nmandic78/yt_summary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Audio Transcription and Summarization Tool

A Python-based tool to download audio from YouTube videos, transcribe the audio using Faster Whisper, and generate concise summaries with a locally hosted LLaMA language model.

Table of Contents

Features

  • Download Audio: Extracts audio from YouTube videos in MP3 format.
  • Transcription: Utilizes Faster Whisper with GPU acceleration for efficient and accurate transcription.
  • Summarization: Generates concise summaries using a locally hosted LLaMA language model.
  • Token Counting: Provides the number of tokens in the transcription for API usage management.
  • User-Friendly Output: Displays summaries in Markdown format using the rich library for enhanced readability.

Prerequisites

Before using this tool, ensure you have the following installed on your system:

  • Python 3.8+: Ensure you have Python installed. You can download it from python.org.

  • FFmpeg: Required for audio processing.

    • Ubuntu/Debian:
    sudo apt update
    sudo apt install ffmpeg
    • macOS (using Homebrew):
    brew install ffmpeg
    • Windows: Download the latest FFmpeg build from FFmpeg Downloads. Follow the installation instructions for your system.
  • CUDA: If you have an NVIDIA GPU and wish to utilize GPU acceleration for Faster Whisper, ensure CUDA is installed and properly configured. Refer to the CUDA Installation Guide for details.

  • Git: To clone repositories.

    • Ubuntu/Debian:
    sudo apt install git
    • macOS (using Homebrew):
    brew install git

Installation

Clone the Repository

git clone https://github.com/nmandic78/yt_summary.git
cd yt_summary

Create a Virtual Environment (Optional but Recommended)

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Python Dependencies

Ensure you have pip updated:

pip install --upgrade pip

Install the required packages:

pip install -r requirements.txt

or:

pip install yt-dlp faster-whisper openai tiktoken rich

Setting Up llama.cpp Server

To generate summaries, the tool relies on a locally hosted LLaMA language model using llama.cpp. Follow the steps below to set up the server.

Clone the llama.cpp Repository

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Build the Server

Ensure you have the necessary build tools installed (e.g., make, gcc).

make

This will compile the llama-server executable.

Download a LLaMA Model from Hugging Face

  • Visit the Hugging Face Models page.
  • Search for a compatible LLaMA model, such as gemma-2-9b-it-Q8_0.gguf.
  • Download the model and place it in a directory of your choice, e.g., /mnt/disk2/LLM_MODELS/models/gemma-2-9b-it-Q8_0.gguf.

Note: Ensure you have the rights and necessary permissions to use the model.

Run the llama.cpp Server

Execute the server with your chosen model:

./llama-server -m /mnt/disk2/LLM_MODELS/models/gemma-2-9b-it-Q8_0.gguf -ngl 99 -c 8192

Parameters Explained:

  • -m: Path to the model file.
  • -ngl: Number of GPU layers (adjust based on your GPU capabilities).
  • -c: Context size in tokens (adjust as needed).

The server will start and listen on http://localhost:8080/v1.

Downloading the Language Model

If you haven't downloaded a LLaMA model yet, follow the steps in the Setting Up llama.cpp Server section to obtain a compatible model from Hugging Face.

Usage

Once you have set up the llama.cpp server and installed all dependencies, you can use the transcription and summarization tool.

Command-Line Arguments

  • -v, --video_url: (Required) YouTube video URL to download and transcribe.
  • -m, --mp3_dir: (Optional) Directory to save the downloaded MP3 file. Default: /home/yourusername/Music/YT_AUDIOS/
  • -t, --transcript_dir: (Optional) Directory to save the transcription text file. Default: /home/yourusername/Music/YT_AUDIOS/

Running the Script

python yt_summary.py -v <YouTube_Video_URL> [options]

Example:

python yt_summary.py -v https://www.youtube.com/watch?v=dQw4w9WgXcQ

This command will:

  • Download the audio from the provided YouTube video URL and save it as an MP3 file in the default directory.
  • Transcribe the audio using Faster Whisper.
  • Generate a summary using the locally hosted LLaMA model.
  • Display the summary in the console and save the transcription to a text file.

Specifying Custom Directories

You can specify custom directories for saving MP3 files and transcriptions:

python yt_summary.py -v <YouTube_Video_URL> -m /path/to/mp3_dir -t /path/to/transcript_dir

Example summary

image

Troubleshooting

  • FFmpeg Not Found: Ensure FFmpeg is installed and added to your system's PATH.
  • CUDA Issues: Verify that CUDA is correctly installed and that your GPU supports the required operations.
  • llama.cpp Server Not Running: Ensure the server is running before executing the transcription script. Verify the server URL and port.
  • Missing Dependencies: Ensure all Python packages are installed. Re-run pip install -r requirements.txt if necessary.
  • Insufficient Permissions: Check directory permissions for saving MP3 and transcription files.

Contributing

Contributions are welcome! Please follow these steps:

Fork the Repository

Create a Feature Branch

git checkout -b feature/YourFeature

Commit Your Changes

git commit -m "Add YourFeature"

Push to the Branch

git push origin feature/YourFeature

Open a Pull Request

Please ensure your code follows the project's coding standards and includes appropriate documentation.

License

This project is licensed under the MIT License.

Developed by Nenad Mandic

About

YouTube Audio Transcription and Summarization Tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages