This project aims to provide real-time translation services using a combination of speech recognition, machine translation, and text-to-speech technologies. It integrates several models and tools to achieve seamless communication across Chinese and English languages
melo/
openvoice/
seamless_communication/
finetune_seamless_m4t_medium.ipynb
seamless_translate.py
tri-model_translation.py
Orignally from MyshelAI's MELOTTS project. Customized for this task.
Contains utilities and APIs for text normalization and text-to-speech (TTS) services.
Orignally from MyshelAI's OPENVOICE project. Customized for this task.
Customization Features:
- Removed Watermark Generation to provide a more faster interference time
- Removed Japanese, Spanish, French, and Korea to improve the initialization time (since this project is only Chinese to English)
Includes components for voice processing and manipulation, such as speaker extraction and tone color conversion.
Orignally from facebook's SEAMLESS COMMUNICATION project. Customized for this task.
Customization Features:
- Customized Training Data
- Customized Training Data, Val Data dateset class
Focuses on integrating different modules for seamless communication, including managing audio input/output and coordinating the translation pipeline.
A Jupyter Notebook for fine-tuning the Seamless M4T model, providing an environment for customizing the model to improve performance on specific datasets.
Main script to perform translation tasks. It initializes and manages the translation pipeline, which includes speech recognition, translation, and text-to-speech conversion.
Script that integrates multiple models for enhanced translation accuracy. It includes functionalities for real-time speech recognition, translation, and TTS using various pre-trained models.
- Python 3.8+
- PyTorch
- Transformers library by Hugging Face
- Additional dependencies listed in
requirements.txt
-
Clone the repository:
git clone https://github.com/ivanhe123/real_time_translation.git cd real_time_translation
-
Install the dependencies:
pip install -r requirements.txt
-
Fine-tuning the Model: Open
finetune_seamless_m4t_medium.ipynb
in Jupyter Notebook and follow the instructions to fine-tune the model on your dataset. -
Real-Time Translation: Run
seamless_translate.py
to start the translation pipeline:python seamless_translate.py
-
Multi-Model Translation: Run
tri-model_translation.py
to use the integrated multi-model approach:python tri-model_translation.py
-
Speech Recognition: Utilizes
transformers
pipeline with a pre-trained Whisper model for converting speech to text. -
Translation: Employs a translation model to convert the recognized text from the source language to the target language.
-
Text-to-Speech: Uses a TTS model to convert the translated text back into speech, facilitating real-time communication.
Contributions are welcome! Please create a pull request or open an issue to discuss any changes or improvements.