CUDA timeout error during diffusion #141

JigenD · 2022-08-08T21:57:27Z

JigenD
Aug 8, 2022

I've successfully generated a few blocks of text, I succeeded with 2 generations of the same long block of text. I also succeeded with shorter text on pat2. However, this long attempt at pat2 with the long text failed. Here is the full output:

/home/jigen/anaconda3/lib/python3.9/site-packages/librosa-0.9.2-py3.9.egg/librosa/util/decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead. return f(*args, **kwargs) /home/jigen/anaconda3/lib/python3.9/site-packages/librosa-0.9.2-py3.9.egg/librosa/util/decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead. return f(*args, **kwargs) /home/jigen/anaconda3/lib/python3.9/site-packages/librosa-0.9.2-py3.9.egg/librosa/util/decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead. return f(*args, **kwargs) /home/jigen/anaconda3/lib/python3.9/site-packages/librosa-0.9.2-py3.9.egg/librosa/util/decorators.py:88: UserWarning: PySoundFile failed. Trying audioread instead. return f(*args, **kwargs) Generating autoregressive samples.. 100%|█████████████████████████████████████████████████████████████████████████████████| 64/64 [1:07:22<00:00, 63.17s/it] Computing best candidates using CLVP 14%|███████████▊ | 9/64 [00:03<00:20, 2.71it/s]No stop tokens found in one of the generated voice clips. This typically means the spoken audio is too long. In some cases, the output will still be good, though. Listen to it and if it is missing words, try breaking up your input text. 47%|██████████████████████████████████████▉ | 30/64 [00:10<00:12, 2.70it/s]No stop tokens found in one of the generated voice clips. This typically means the spoken audio is too long. In some cases, the output will still be good, though. Listen to it and if it is missing words, try breaking up your input text. 69%|█████████████████████████████████████████████████████████ | 44/64 [00:15<00:07, 2.71it/s]No stop tokens found in one of the generated voice clips. This typically means the spoken audio is too long. In some cases, the output will still be good, though. Listen to it and if it is missing words, try breaking up your input text. 88%|████████████████████████████████████████████████████████████████████████▋ | 56/64 [00:20<00:02, 2.72it/s]No stop tokens found in one of the generated voice clips. This typically means the spoken audio is too long. In some cases, the output will still be good, though. Listen to it and if it is missing words, try breaking up your input text. 92%|████████████████████████████████████████████████████████████████████████████▌ | 59/64 [00:21<00:01, 2.72it/s]No stop tokens found in one of the generated voice clips. This typically means the spoken audio is too long. In some cases, the output will still be good, though. Listen to it and if it is missing words, try breaking up your input text. 100%|███████████████████████████████████████████████████████████████████████████████████| 64/64 [00:23<00:00, 2.76it/s] Transforming autoregressive outputs into audio.. 60%|████████████████████████████████████████████████▌ | 120/200 [00:39<00:26, 3.04it/s] Traceback (most recent call last): File "/mnt/d/ai/tortoise-tts/tortoise/read.py", line 62, in <module> gen = tts.tts_with_preset(text, voice_samples=voice_samples, conditioning_latents=conditioning_latents, File "/mnt/d/ai/tortoise-tts/tortoise/api.py", line 328, in tts_with_preset return self.tts(text, **settings) File "/mnt/d/ai/tortoise-tts/tortoise/api.py", line 491, in tts mel = do_spectrogram_diffusion(self.diffusion, diffuser, latents, diffusion_conditioning, File "/mnt/d/ai/tortoise-tts/tortoise/api.py", line 158, in do_spectrogram_diffusion mel = diffuser.p_sample_loop(diffusion_model, output_shape, noise=noise, File "/home/jigen/anaconda3/lib/python3.9/site-packages/TorToiSe-2.4.2-py3.9.egg/tortoise/utils/diffusion.py", line 565, in p_sample_loop for sample in self.p_sample_loop_progressive( File "/home/jigen/anaconda3/lib/python3.9/site-packages/TorToiSe-2.4.2-py3.9.egg/tortoise/utils/diffusion.py", line 611, in p_sample_loop_progressive out = self.p_sample( File "/home/jigen/anaconda3/lib/python3.9/site-packages/TorToiSe-2.4.2-py3.9.egg/tortoise/utils/diffusion.py", line 514, in p_sample out = self.p_mean_variance( File "/home/jigen/anaconda3/lib/python3.9/site-packages/TorToiSe-2.4.2-py3.9.egg/tortoise/utils/diffusion.py", line 1121, in p_mean_variance return super().p_mean_variance(self._wrap_model(model), *args, **kwargs) File "/home/jigen/anaconda3/lib/python3.9/site-packages/TorToiSe-2.4.2-py3.9.egg/tortoise/utils/diffusion.py", line 353, in p_mean_variance min_log = _extract_into_tensor( File "/home/jigen/anaconda3/lib/python3.9/site-packages/TorToiSe-2.4.2-py3.9.egg/tortoise/utils/diffusion.py", line 1247, in _extract_into_tensor res = th.from_numpy(arr).to(device=timesteps.device)[timesteps].float() RuntimeError: CUDA error: the launch timed out and was terminated CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Likely a one-off problem and should retry, a bug, or something I can rectify?

Thanks!

(Later it worked, no changes, so I guess just a random failure? I would be curious why this occurred)

neonbjb · 2022-08-09T02:56:00Z

neonbjb
Aug 9, 2022
Maintainer

Hey there - that's a tough one. It sounds a lot like something deep in the guts of torch or the driver might have gotten stuck here. Python is pretty hard to screw up in the way this error sounds, so I really don't think the problem is with tortoise.

I have seen this error when I have run too many CUDA programs at once. You don't happen to be running something else on your GPU?

1 reply

JigenD Aug 10, 2022
Author

Well my environment is Windows, and this is running on WSL, so I'd have a hard time saying the GPU isn't being utilized for something. But I wasn't explicitly using the GPU for anything else.

Good to know though! I've had it happen one more time randomly. I wonder if there is perhaps a really strict timeout interval, but I guess it'd be really deep.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA timeout error during diffusion #141

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

CUDA timeout error during diffusion #141

JigenD Aug 8, 2022

Replies: 1 comment · 1 reply

neonbjb Aug 9, 2022 Maintainer

JigenD Aug 10, 2022 Author

JigenD
Aug 8, 2022

Replies: 1 comment 1 reply

neonbjb
Aug 9, 2022
Maintainer

JigenD Aug 10, 2022
Author