You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.
I observed that CTC objective function was increasing and finally converged during training.
But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure
In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure
Do you have same situation when you train NN-CTC for sequence labeling problem? I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.
Any idea about this result?
Thank you for reading my question.
The text was updated successfully, but these errors were encountered:
CTC training make NNs first learn to predict only blanks. It might take some time for relevant predictions to appear -> adaptive LR methods like RMSProp work very well to circumvent this issue
Maybe training 1 epoch with HMM Viterbi alignments before switching to CTC would help.. from scratch it might be hard to learn to align and transcribe.
Hi all
I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.
I observed that CTC objective function was increasing and finally converged during training.
But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure
In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure
Do you have same situation when you train NN-CTC for sequence labeling problem? I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.
Any idea about this result?
Thank you for reading my question.
The text was updated successfully, but these errors were encountered: