p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

lifelongeek · 2015-06-24T12:50:41Z

Hi all

I want to discuss some issue regarding training DNN/CNN-CTC for speech recognition. (Wall Street Journal Corpus). I modeled output unit as characters.

I observed that CTC objective function was increasing and finally converged during training.

But I also observed that final NN outputs have clear tendency : p(blank symbol) >> p(non-blank symbol) for all speech time frame as following figure

In Alex Graves' paper, trained RNN should have high p(non-blank) at some point like following figure

Do you have same situation when you train NN-CTC for sequence labeling problem? I am suspecting that the reason is I use MLP/CNN instead of RNN, but I can't clearly explain why this can be a reason.
Any idea about this result?

Thank you for reading my question.

tbluche · 2016-02-29T15:26:06Z

Hi,
I have quite the same experience with handwriting recognition.
I did some exploration of CTC training with different NNs during my PhD and the results are the following:

CTC training works especially well with RNNs
CTC training make NNs first learn to predict only blanks. It might take some time for relevant predictions to appear -> adaptive LR methods like RMSProp work very well to circumvent this issue
Maybe training 1 epoch with HMM Viterbi alignments before switching to CTC would help.. from scratch it might be hard to learn to align and transcribe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

lifelongeek commented Jun 24, 2015

tbluche commented Feb 29, 2016

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

p(blank symbol) >> p(non-blank symbol) during NN-CTC training #4

Comments

lifelongeek commented Jun 24, 2015

tbluche commented Feb 29, 2016