-
-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train loss of custom data #133
Comments
Hi @Wangzhen-kris, what kind of data does your dataset consist of? Is it by any chance containing very diverse speakers or even multiple languages? Also, are they organized into separate cut sets which were combined for training? While trying to train on Apache CommonVoice I ran into similar graphs. I found out that the usage of the Lhotse Dynamic samplers leads to the issue of static CutSet order - Which means Language C always gets trained after B, which is trained after A. I figured a solution for this, by randomizing the CutSet contents before training. It is quite Memory Intensive on a large dataset (~60 GB needed for almost complete CommonVoice 13) and also quite slow since it's a single threaded process. Takes about 10 Minutes on my AI server. main...RuntimeRacer:vall-e:cuts_randomizer Also I attached a screenshot how this stabilized my training; the arrows point to where this was applied after 2 epochs without this pre-processing: |
Hi,
I tried to train on my dataset, but I seem to have an abnormal loss curve. Do you have any suggestions?
Thanks.
The loss of AR:
https://drive.google.com/file/d/1-gZJX-mwYZ-2vkKTl0dTwBcp1A8MHrmV/view?usp=drive_link
The loss of NAR:
https://drive.google.com/file/d/1-9L_AQZyyAgDRqKPpx06w6M99ZPSUIhe/view?usp=drive_link
The text was updated successfully, but these errors were encountered: