Skip to content

Latest commit

 

History

History
108 lines (77 loc) · 5.24 KB

README.md

File metadata and controls

108 lines (77 loc) · 5.24 KB

Parrot-TTS

Parrot-TTS is a text-to-speech (TTS) system that utilizes a Transformer based sequence-to-sequence model to map character tokens to HuBERT quantized units and a modified HiFi-GAN vocoder for speech synthesis. This repository is an official impplementation of our EACL 2024 paper available at https://aclanthology.org/2024.findings-eacl.6/. This repository provides instructions for installation, demo execution, and training the TTS model on your own data. We have uploaded a few files generated with our model (trained with no transliteration for non-English characters) and are available at https://drive.google.com/file/d/1b4uoeRv106J-4NvzVnotfBiAuFz049_q/view?usp=sharing

Libraries Installation

  1. Create and activate a new Conda environment:

    conda create --name parrottts python=3.8.19
    conda activate parrottts
  2. Install the required libraries:

    pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu125

Running a Demo

Run a demo using the provided Jupyter notebook, demo.ipynb. The checkpoints are trained on training data available from https://sites.google.com/view/limmits25/home?authuser=0

  • The notebook will automatically download the following files from Google Drive and store at following locations:
    • runs/aligner/symbol.pkl: A dictionary to map characters to tokens.
    • runs/TTE/ckpt: Model to convert character text tokens to HuBERT units.
    • runs/vocoder/checkpoints: Model to predict speech from HuBERT units.

Training Parrot-TTS on Your Data

To train Parrot-TTS on your dataset, follow these steps (1-10):

Step 1: Compute Unique Symbols/Characters

  • Update the dataset_dir folder in utils/aligner/aligner_preprocessor_config.yaml. The dataset_dir contains individual speakers and within it contains their wavs and txt files. The code cleans text files per speaker, stores them separately, and computes unique characters across all speakers. For non-english speakers, make sure to check do_transliteration flag in utils/aligner/aligner_preprocessor_config.yaml.
    python utils/aligner/preprocessor.py utils/aligner/aligner_preprocessor_config.yaml

Step 2: Train Aligner for Each Speaker

  • Update base_dataset_dir in train.sh. base_dataset_dir is the same as dataset_dir used in Step 1.
    bash utils/aligner/train.sh

Step 3: Extract HuBERT Units

  • Download the HuBERT checkpoint and quantizer from this link and store them in utils/hubert_extraction. Once downloaded, the following command can be run. Note: You may need to clone and install fairseq to run this step.
  • Run the following command to extract HuBERT units:
    python utils/hubert_extraction/extractor.py utils/hubert_extraction/hubert_config.yaml
  • Note: HuBERT units have already been extracted for the corpus and are available at this Google Drive link. Download and save it at runs/hubert_extraction.

Step 4: Create Files for TTE Training

  • Prepare the necessary files for training the TTE module:
    python utils/TTE/preprocessor.py utils/TTE/TTE_config.yaml

Step 5: Train the TTE Module

  • Train the TTE module using the following command:
    python train.py --config utils/TTE/TTE_config.yaml --num_gpus 1

Step 6: Infer HuBERT Prediction

  • Run inference to predict HuBERT from the trained TTE module:
    python inference.py --config utils/TTE/TTE_config.yaml --checkpoint_pth runs/TTE/ckpt/parrot_model-step=50000-val_total_loss_step=0.00.ckpt --device cuda:2

Step 7: Create Training and Validation Files for Vocoder

  • Generate training and validation files for the vocoder:
    python utils/vocoder/preprocessor.py --input_file runs/hubert_extraction/hubert.txt --root_path runs/vocoder

Step 8: Train HiFi-GAN Vocoder

  • Set the number of GPUs in the nproc_per_node variable and run the following command:
    CUDA_VISIBLE_DEVICES=1,2,3 python -m torch.distributed.run --nproc_per_node=3 utils/vocoder/train.py --checkpoint_path runs/vocoder/checkpoints --config utils/vocoder/config.json

Step 9: Infer Vocoder on Validation File

  • Infer the vocoder on the validation file:
    python utils/vocoder/inference.py --checkpoint_file runs/vocoder/checkpoints -n 100 --vc --input_code_file runs/vocoder/val.txt --output_dir runs/vocoder/generations_vocoder

Step 10: Infer Vocoder on Actual Predictions

  • Infer the vocoder on predictions from the TTE module:
    python utils/vocoder/inference.py --checkpoint_file runs/vocoder/checkpoints -n 100 --vc --input_code_file runs/TTE/predictions.txt --output_dir runs/vocoder/generations_tte

Acknowledgements

This repository is developed using insights from: