Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
blocks		blocks
README.md		README.md

Repository files navigation

BiRNN in Blocks trained with CTC on TIMIT

Implementation in Blocks (Theano). Trainable with CTC or framewise.

requirements: - Theano: http://deeplearning.net/software/theano/install.html - Blocks: http://blocks.readthedocs.org/en/latest/setup.html - Blocks extras: https://github.com/mila-udem/blocks-extras - Fuel: http://fuel.readthedocs.org/en/latest/setup.html - PySoundFile to read timit's depcrecated .wav-like format, See: http://pysoundfile.readthedocs.org/en/0.7.0/ and https://github.com/bastibe/PySoundFile - python_speech_features for preprocessing (FFT-based filterbank), see http://python-speech-features.readthedocs.org/en/latest/ + https://github.com/jameslyons/python_speech_features

#Notes:

Decoding: simple argmax, no expensive beamsearch
Mapping from original 61 to reduced 39 Phonemes can be done before training or during decoding.

3 layer BiRNN with [300,250,200] hidden units, batch size 40, AdaDelta, mapping to 39 classes before training:

GRU on MFCC features: 19.5% PER
GRU on Log-FB features: 20.5% PER
LSTM on MFCC features: 19.5% PER
LSTM on Log-FB features: ?

#Credits CTC Implementation: ctc_cost.py is copied from Philemon Brakel's repository: https://github.com/pbrakel/CTC-Connectionist-Temporal-Classification

About

No description, website, or topics provided.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Python 100.0%