Replies: 5 comments 3 replies
-
hey - that link takes me to an XML page like this, do you know why that might be happening? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Thanks for your work! May I ask how did you deal with the sampling rate of libri-light datasets? In the original VALL-E paper, 24 kHz Encodec model is used and the sample audios on demo website (web) are of 24 kHz. |
Beta Was this translation helpful? Give feedback.
-
@kevmo314 Do you have code or a repo for how you did this? I'm looking to do this for the full 60k dataset. Happy to share when done. |
Beta Was this translation helpful? Give feedback.
-
@kevmo314 Thanks for sharing the librilight data, seems it only has 1 speaker's voice. Speaker id is 100, wondering whether there is 6000 hours |
Beta Was this translation helpful? Give feedback.
-
hi there, I spent some time transcribing and encoding the libri-light small and medium datasets. since it was a nontrivial amount of work, I'd like to share it more publicly if it is useful to anyone. for each audio file in the small and medium datasets, I've:
In total, this amounts to ~6000 hours of audio data. you can find all the data here: https://storage.googleapis.com/speech-synthesis-datasets.
I hope this is useful in helping to reproduce the original VALL-E paper, as it does the same with the full 60k hours. I would like to encode and transcribe the large dataset, but I don't have the GPU resources to do so.
I will be using the above dataset to try to get closer to the VALL-E paper with this repo, thanks for your work so far!
Beta Was this translation helpful? Give feedback.
All reactions