The TalkingDragons project is a Python-based solution designed to batch transcribe audio and video files within a specified folder using OpenAI's Whisper speech recognition model. It processes all ffmpeg-compatible media files, generating a transcription .txt
file for each, and ensures backups of existing transcription files to avoid overwriting.
The name "TalkingDragons" is inspired by the cultural significance of the Dragão do Mar. "Dragão do Mar, or 'Sea Dragon,' is the honored name of Francisco José do Nascimento, an Afro-Brazilian jangadeiro and abolitionist, who courageously led a strike in 1881, refusing to transport enslaved individuals in Fortaleza, ultimately contributing to the abolition of slavery in Ceará, Brazil in 1884."
For more information regarding Francisco José do Nascimento checkout his wikipedia article
- Batch Processing: Automatically transcribes multiple audio and video files in a specified directory.
- Whisper Integration: Utilizes OpenAI's Whisper model for state-of-the-art speech recognition.
- Automatic Backup: Backs up existing transcription files with incremental numeric suffixes starting from
001
. - Detailed Reporting: Provides a summary report including transcription time, language detected, and model used.
- Language and Model Selection: Supports automatic language detection or user-specified input, with customizable Whisper models.
-
Clone the Repository:
git clone https://github.com/alisio/talking-dragons.git
-
Navigate to the Script Directory:
cd talkingdragons
-
Install Dependencies: Ensure Python 3.7 or later is installed, and run:
pip install -r requirements.txt
Run the Python script from the command line:
python whisper_transcription.py <inputs> [--language <language>] [--model <model>]
<inputs>
: One or more files or directories containing media files to transcribe.--language
: (Optional) Specify the language of the audio. If omitted, the script will detect it automatically.--model
: (Optional) Specify the Whisper model to use (default:base
).
python whisper_transcription.py /path/to/media/files --language en --model large
The following parameters can be customized within the script or passed as arguments:
- Language: Detects automatically or can be set explicitly using
--language
. - Model: Use the
--model
flag to specify the Whisper model size (e.g.,tiny
,base
,large
).
- Transcriptions are saved as
.txt
files in the same directory as the input files. - Existing transcription files are backed up with incremental suffixes (e.g.,
_backup_001
). - A detailed report is generated at the end, showing:
- Transcription time per file
- Total processing time
- Model used
- Language detected
- Output file paths
- Python 3.7 or later
- whisper
- tqdm
- ffmpeg (installed on your system)
Install Python dependencies via:
pip install whisper tqdm
Ensure ffmpeg
is installed and available in your system's PATH. For installation instructions, refer to the FFmpeg documentation.
This project draws inspiration from the "Dragão do Mar" (Sea Dragon), honoring Francisco José do Nascimento, an Afro-Brazilian jangadeiro and abolitionist who played a pivotal role in the abolition of slavery in Ceará, Brazil, in 1884.
This project is licensed under the MIT License. For more details, see the LICENSE.md file in the repository.
Contributions are welcome! Feel free to open an issue or submit a pull request to improve the project.