When the observations are highly noisy, I encounter NaN values during model training. Has this issue ever come up for you? #12

hdadong · 2025-01-16T02:28:30Z

When I tried using SSRL for training on a real robot, I encounter NaN values during model training. I suspect it’s because the real setup is quite noisy. To verify, I added the same level of noise to the observations in simulation and observed a similar issue. Has this issue ever come up for you?

Model epoch 0: train total loss 16197.876953125, train mean loss 26707.083984375, test mean loss [2.1955036e+14]
Model epoch 1: train total loss 2.241743061862318e+17, train mean loss 3.6961121031788954e+17, test mean loss [3.817644e+12]
Model epoch 2: train total loss nan, train mean loss nan, test mean loss [nan]
Model epoch 3: train total loss nan, train mean loss nan, test mean loss [nan]
Model epoch 4: train total loss nan, train mean loss nan, test mean loss [nan]
Model epoch 5: train total loss nan, train mean loss nan, test mean loss [nan]

jake-levy · 2025-01-16T18:46:23Z

Hmm...I'm thinking you're getting nans from exploding gradients due to very noisy data or because the lagrangian dynamics are very stiff. For the former, maybe you could try a lower learning rate or preprocessing the data with a zero-phase filter? For the latter, double check your robot xml definition -- if, for example, the leg link masses are very small, small changes in contact forces will result in very large changes in acceleration (and state).

You can also check out the debug nans feature of jax (https://jax.readthedocs.io/en/latest/debugging/flags.html) to see where the nan comes up. You might have to disable jit for it to give you the exact location where the nan occurs first.

hdadong changed the title ~~When the observations are highly noisy, I encounter NaN values during world model training. Has this issue ever come up for you?~~ When the observations are highly noisy, I encounter NaN values during model training. Has this issue ever come up for you? Jan 16, 2025

jake-levy closed this as completed Jan 16, 2025

jake-levy reopened this Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When the observations are highly noisy, I encounter NaN values during model training. Has this issue ever come up for you? #12

When the observations are highly noisy, I encounter NaN values during model training. Has this issue ever come up for you? #12

hdadong commented Jan 16, 2025

jake-levy commented Jan 16, 2025

When the observations are highly noisy, I encounter NaN values during model training. Has this issue ever come up for you? #12

When the observations are highly noisy, I encounter NaN values during model training. Has this issue ever come up for you? #12

Comments

hdadong commented Jan 16, 2025

jake-levy commented Jan 16, 2025