You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tldr: the offline model has several problems that the online demo (with a paid option) does not have:
-generation is cut at 30 seconds
-random long silence or dropping of words or whole sentences
-random word slurring or noises
in my own setup, every third generation is unusable.
all these problems make the model unusable for serious applications.
the problems are not documented anywhere but people found out after installation and usage.
this forum is full of people complaining about and spending a lot of time trying to fix it by various means with the little optiona available.
yet the online demo does not have such problems:
all generations are quite flawless and there is no time limit.
yet it seems we dont have access to the parameters or code used in the online demo.
this leads to the question:
are these problems on purpose? is this model just a "shareware" demo of the paid service? back then shareware demos were little demos meant as advertisement which contained just a part of the product or had hinderances built in so that the product was unusable apart from small "demo" use.
i havent seen a dev comment of the problems. its mostly users saying that its version 0.1 and later versions will improve. which in general is what you would expect.
maybe i am terribly mistaken and the online demo also has flaws that i havent seen.
could the original authors give an statement?
The text was updated successfully, but these errors were encountered:
@mytait I am also exploring the model and am a bit underwhelmed what it can do out of the box.
Can you share a bit what you have tried?
Are you using the transformer model or the hybrid one?
Which language are you trying to generate sounds for? The docs say that the dataset is predominantly English with 'substantial' data in Chinese, Japanese, French, Spanish, and German. I am trying to have it speak German and wonder how much data they really used.
Regarding the 30s limitation: This seems hard coded at the moment and the idea is to do repeated generations which you string together.
the best quality out of the box is using the gradio app. i am using transformer
the inference code lacks a lot of the functionality in the gradio app: notably that the gradio demo uses an audio prefix of 100ms silence. this dramatically improves the quality.. still the usability is bad. Also you should set the seed manually to a seed that you know works.. this is trial and error. Set the emotions to inconditional and dont use them. this leaves most of the work to finding a good seed. in general use the settings from the gradio app and dont change anything.. if your audio gets even close to 30 seconds then errors appear and sentences get dropped. so short texts
tldr: the offline model has several problems that the online demo (with a paid option) does not have:
-generation is cut at 30 seconds
-random long silence or dropping of words or whole sentences
-random word slurring or noises
in my own setup, every third generation is unusable.
all these problems make the model unusable for serious applications.
the problems are not documented anywhere but people found out after installation and usage.
this forum is full of people complaining about and spending a lot of time trying to fix it by various means with the little optiona available.
yet the online demo does not have such problems:
all generations are quite flawless and there is no time limit.
yet it seems we dont have access to the parameters or code used in the online demo.
this leads to the question:
are these problems on purpose? is this model just a "shareware" demo of the paid service? back then shareware demos were little demos meant as advertisement which contained just a part of the product or had hinderances built in so that the product was unusable apart from small "demo" use.
i havent seen a dev comment of the problems. its mostly users saying that its version 0.1 and later versions will improve. which in general is what you would expect.
maybe i am terribly mistaken and the online demo also has flaws that i havent seen.
could the original authors give an statement?
The text was updated successfully, but these errors were encountered: