Implementation in Blocks (Theano). Trainable with CTC or framewise.
requirements: - Theano: http://deeplearning.net/software/theano/install.html - Blocks: http://blocks.readthedocs.org/en/latest/setup.html - Blocks extras: https://github.com/mila-udem/blocks-extras - Fuel: http://fuel.readthedocs.org/en/latest/setup.html - PySoundFile to read timit's depcrecated .wav-like format, See: http://pysoundfile.readthedocs.org/en/0.7.0/ and https://github.com/bastibe/PySoundFile - python_speech_features for preprocessing (FFT-based filterbank), see http://python-speech-features.readthedocs.org/en/latest/ + https://github.com/jameslyons/python_speech_features
#Notes:
- Decoding: simple argmax, no expensive beamsearch
- Mapping from original 61 to reduced 39 Phonemes can be done before training or during decoding.
3 layer BiRNN with [300,250,200] hidden units, batch size 40, AdaDelta, mapping to 39 classes before training:
- GRU on MFCC features: 19.5% PER
- GRU on Log-FB features: 20.5% PER
- LSTM on MFCC features: 19.5% PER
- LSTM on Log-FB features: ?
#Credits CTC Implementation: ctc_cost.py is copied from Philemon Brakel's repository: https://github.com/pbrakel/CTC-Connectionist-Temporal-Classification