Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

example not running #1

Open
skaae opened this issue May 24, 2015 · 31 comments
Open

example not running #1

skaae opened this issue May 24, 2015 · 31 comments

Comments

@skaae
Copy link

skaae commented May 24, 2015

I'm trying to run your ctc example but i get the following error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
Starting training ...
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
INFO:blocks.algorithms:The training algorithm is initialized
ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run `on_error` extensions, potentially saving data, before exiting and reraising the error. Note that the usual `after_training` extensions will *not* be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.

-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     epoch_interrupt_received: False
     epoch_started: True
     epochs_done: 0
     iterations_done: 0
     received_first_batch: False
     training_started: True
Log records from the iteration 0:

Traceback (most recent call last):
  File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in <module>
    main_loop.run()
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
    reraise_as(e)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/__init__.py", line 225, in reraise_as
    six.reraise(type(new_exc), new_exc, orig_exc_traceback)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
    while self._run_epoch():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
    while self._run_iteration():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
    self.algorithm.process_batch(batch)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py", line 234, in process_batch
    self._function(*ordered_batch)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 517, in __call__
    allow_downcast=s.allow_downcast)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/tensor/type.py", line 130, in filter
    raise TypeError(err_msg, data)
TypeError: ('Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py:224"  at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 0.  0.  0.  0.  1.  1.  1.  1.  1.  1.]]\n\nOriginal exception:\n\tTypeError: Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py:224"  at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]\n [ 0.  0.  0.  0.  1.  1.  1.  1.  1.  1.]]', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.]]))

Which i think i can workaround by setting allow_input_downcast=True in line 224 in blocks/algorithms/__init__.py

But then i get another error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
Starting training ...

INFO:blocks.algorithms:The training algorithm is initialized
-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
     batch_interrupt_received: False
     epoch_interrupt_received: False
     epoch_started: True
     epochs_done: 0
     iterations_done: 0
     received_first_batch: False
     training_started: True
Log records from the iteration 0:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run `on_error` extensions, potentially saving data, before exiting and reraising the error. Note that the usual `after_training` extensions will *not* be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.
Traceback (most recent call last):
  File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in <module>
    main_loop.run()
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
    reraise_as(e)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/__init__.py", line 225, in reraise_as
    six.reraise(type(new_exc), new_exc, orig_exc_traceback)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
    while self._run_epoch():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
    while self._run_iteration():
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
    self.algorithm.process_batch(batch)
  File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/__init__.py", line 234, in process_batch
    self._function(*ordered_batch)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 610, in __call__
    storage_map=self.fn.storage_map)
  File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 599, in __call__
    outputs = self.fn()
TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Original exception:
    TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can you add a few notes explaning what S, T, B, D, L, C and F are?

Maybe you could also explain the input format for apply(cls, y, y_hat, y_mask, y_hat_mask, scale='log_scale') ?

Is it correct that:

  • y : one-hot encoded labels, LABEL_LENGTH x BATCH_SIZE
  • y_hat : predictions: INPUT_SEQUENCE_LENGTH x BATCH_SIZE x {num_classes + blank}
  • y_mask : binary mask? shp? I assume that one is used for included sequences?
  • y_mask_hat : Used to mask if the input is not INPUT_SEQUENCE_LENGTH?

Where INPUT_SEQUENCE_LENGTH is the length of the input sequences (30 for the example data) and LABEL_LENGTH is the label sequence for each target. Is LABEL_LENGTH padded if the true label length vary?

-Søren

@mpezeshki
Copy link
Owner

Hi Soren,

ctc_test_data.pkl is a toy dataset containing S batches. Each batch
contains B examples. And each example has a length of T and F features.
The important thing about ctc_test_data.pkl that shows the functionality of
CTC is that the length of input and output sequences are different. So
according to what I said above, the output has S batches of B examples of
length L (different from T).

Apparently you have a problem in casting. Try to run it with this flag:
THEANO_FLAGS='floatX=float64' python something.py

On Sun, May 24, 2015 at 9:49 AM, Søren Kaae Sønderby <
[email protected]> wrote:

I'm trying to run your ctc example but i get the following error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
Starting training ...
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
INFO:blocks.algorithms:The training algorithm is initialized
ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.


BEFORE FIRST EPOCH

Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:

Traceback (most recent call last):
File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in
main_loop.run()
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
reraise_as(e)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
while self._run_epoch():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
while self._run_iteration():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
self.algorithm.process_batch(batch)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch
self._function(*ordered_batch)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 517, in call
allow_downcast=s.allow_downcast)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/tensor/type.py", line 130, in filter
raise TypeError(err_msg, data)
TypeError: ('Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]\n\nOriginal exception:\n\tTypeError: Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this lo
ss, you
can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]]))

Which i think i can workaround by setting allow_input_downcast=True in
line 224 in blocks/algorithms/init.py

But then i get another error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
Starting training ...

INFO:blocks.algorithms:The training algorithm is initialized

BEFORE FIRST EPOCH

Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.
Traceback (most recent call last):
File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in
main_loop.run()
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
reraise_as(e)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
while self._run_epoch():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
while self._run_iteration():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
self.algorithm.process_batch(batch)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch
self._function(*ordered_batch)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 610, in call
storage_map=self.fn.storage_map)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 599, in call
outputs = self.fn()
TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Original exception:
TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can you add a few notes explaning what S, T, B, D, L, C and F are?


Reply to this email directly or view it on GitHub
#1
.

@mpezeshki
Copy link
Owner

y_mask: LABEL_LENGTH x BATCH_SIZE
That's because the length of sequences in a batch may vary. In this case, sequences are padded with zero.
y_mask_hat: INPUT_SEQUENCE_LENGTH x BATCH_SIZE

@skaae
Copy link
Author

skaae commented May 26, 2015

Thanks. Your code seems to run fine without blocks. I do have floatX=float32 but isn’t that necessary when you use GPU?

What is the license on the code? I’m planning to include a CTC example, using your code, in the theano lasagne library.

best regards Søren

On 26 May 2015, at 12:29, Mohammad Pezeshki [email protected] wrote:

Hi Soren,

ctc_test_data.pkl is a toy dataset containing S batches. Each batch
contains B examples. And each example has a length of T and F features.
The important thing about ctc_test_data.pkl that shows the functionality of
CTC is that the length of input and output sequences are different. So
according to what I said above, the output has S batches of B examples of
length L (different from T).

Apparently you have a problem in casting. Try to run it with this flag:
THEANO_FLAGS='floatX=float64' python something.py

On Sun, May 24, 2015 at 9:49 AM, Søren Kaae Sønderby <
[email protected]> wrote:

I'm trying to run your ctc example but i get the following error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
Starting training ...
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
INFO:blocks.algorithms:The training algorithm is initialized
ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.


BEFORE FIRST EPOCH

Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:

Traceback (most recent call last):
File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in
main_loop.run()
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
reraise_as(e)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
while self._run_epoch():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
while self._run_iteration():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
self.algorithm.process_batch(batch)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch
self._function(*ordered_batch)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 517, in call
allow_downcast=s.allow_downcast)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/tensor/type.py", line 130, in filter
raise TypeError(err_msg, data)
TypeError: ('Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]\n\nOriginal exception:\n\tTypeError: Bad input argument to theano function with name "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py:224" at index 0(0-based), TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this lo
ss, you
can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function"., [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]\n [ 0. 0. 0. 0. 1. 1. 1. 1. 1. 1.]]', 'TensorType(float32, matrix) cannot store a value of dtype float64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to float32, or 2) set "allow_input_downcast=True" when calling "function".', array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
[ 0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]]))

Which i think i can workaround by setting allow_input_downcast=True in
line 224 in blocks/algorithms/init.py

But then i get another error:

Building model ...
/Users/sorensonderby/Documents/phd/RNN/Theano/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
from scan_perform.scan_perform import *
Bulding DataStream ...
Bulding training process...
INFO:blocks.algorithms:Taking the cost gradient
INFO:blocks.algorithms:The cost gradient computation graph is built
INFO:blocks.main_loop:Entered the main loop
INFO:blocks.algorithms:Initializing the training algorithm
Starting training ...

INFO:blocks.algorithms:The training algorithm is initialized

BEFORE FIRST EPOCH

Training status:
batch_interrupt_received: False
epoch_interrupt_received: False
epoch_started: True
epochs_done: 0
iterations_done: 0
received_first_batch: False
training_started: True
Log records from the iteration 0:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately.
Traceback (most recent call last):
File "/Users/sorensonderby/Documents/phd/RNN/CTC-Connectionist-Temporal-Classification/test_ctc.py", line 122, in
main_loop.run()
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 192, in run
reraise_as(e)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/utils/init.py", line 225, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 178, in run
while self._run_epoch():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 227, in _run_epoch
while self._run_iteration():
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/main_loop.py", line 247, in _run_iteration
self.algorithm.process_batch(batch)
File "/Users/sorensonderby/Documents/phd/RNN/blocks/blocks/algorithms/init.py", line 234, in process_batch
self._function(*ordered_batch)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 610, in call
storage_map=self.fn.storage_map)
File "/Users/sorensonderby/Documents/phd/RNN/Theano/theano/compile/function_module.py", line 599, in call
outputs = self.fn()
TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Original exception:
TypeError: expected type_num 7 (NPY_INT64) got 12
Apply node that caused the error: Elemwise{Add}[(0, 1)](Viterbi, shared_Viterbi)
Inputs types: [TensorType(int64, vector), TensorType(int64, vector)]
Inputs shapes: [(0,), (7,)]
Inputs strides: [(8,), (8,)]
Inputs values: [array([], dtype=float64), 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Can you add a few notes explaning what S, T, B, D, L, C and F are?


Reply to this email directly or view it on GitHub
#1
.


Reply to this email directly or view it on GitHub.

@mpezeshki
Copy link
Owner

Happy to hear that you want to use it in lasagne.
The initial code that I started from is written be Rakesh Var and Shawn Tan.
Although my code is pretty much different from theirs, but you should send them an email and ask them as well. Since Rakesh's code has Apache license, I also added it to my repo.
Something important to notice is that the current version of code outputs NAN for very long sequences (the code may have other bugs too!). We have solved this problem in another private repo but it's not clean yet. But eventually, I'll update this repo as well.

Good luck,

@skaae
Copy link
Author

skaae commented May 26, 2015

Good to hear. I'll probably see if i can reproduce Alex Graves handwritten digit recognition results.

I have to admit that I havent looked closely at the implementation yet, but i'll do that in the coming days.

Out of curiosity, have you tested your private repo code on some "real" datasets? Would you be willing to put up unclean code? I can clean it up in a PR to this repo then?

I'll of course attribute you, Rakesh Var, Shawn Tan and other people who contributed.

Your help is appreciated :)

@mpezeshki
Copy link
Owner

@skaae , the new changes to code is made by Phil Brakel.
I'll put his version in another branch of current repo tonight (Montreal time). So please attribute him as well.

@skaae
Copy link
Author

skaae commented May 27, 2015

Great! Thanks

@skaae
Copy link
Author

skaae commented May 29, 2015

Hi thanks for sharing. I started to work my way through the CTC and came across some differences between the formulation in

http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf (your reference)

and in Alex Graves' book: http://www.cs.toronto.edu/~graves/preprint.pdf

The differences are in the initial states of the backward pass. In the paper, eq. 9, they are specified as
the probablity of blank and correct label.

But in the book eq. 7.13 specifies them as 1. From the definition of the beta values i believe that 1 is the correct value?

I haven't fully understood how you define the recursion with a matrix, but given you calculate the backward pass as the reverse of the forward pass i don't believe that they handle the inital states differently?

@skaae
Copy link
Author

skaae commented May 29, 2015

additionally equation 10 in the paper uses y_t while eq. 7.15 in the book uses y_{t+1}?

@mpezeshki
Copy link
Owner

@skaae , my pleasure 👍
Maybe I'm wrong, but I think the forward pass is enough. So my implementation is a bit different.

@skaae
Copy link
Author

skaae commented May 29, 2015

pseudo_cost use both log_alpha and log_beta when calculating marginals. That is done in the get_targets function?

Do you train with pseudo cost or cost function?

Also from eq 15 in http://www.cs.toronto.edu/~graves/icml_2006.pdf it seems that you need both alpha and beta?

@mpezeshki
Copy link
Owner

I see. There are new changes in the recent version that I don't know well about.
You may ask @pbrakel .

@skaae
Copy link
Author

skaae commented May 29, 2015

I’m working on some tests for the forward and backward matrices if you are interested?.

I just need to figure out the initial states for beta which i’m fairly sure should be 1 and not y probs

On 29 May 2015, at 15:23, Mohammad Pezeshki [email protected] wrote:

I see. There are new changes in the recent version that I don't know well about.
You may ask @pbrakel .


Reply to this email directly or view it on GitHub.

@pbrakel
Copy link
Collaborator

pbrakel commented May 29, 2015

Hey @skaae,

More tests are always nice and if you find bugs please let us know!

First of all, not all the functions in my version of the code might be correct anymore because I just focused on the higher level log domain ones. The tests are messy as well.

We train using the pseudo cost function because for some reason the gradient of the normal cost function is unstable. The pseudo cost simply computes the CTC gradient directly without using automated differentiation. To turn this gradient into a cost that can be used for automated differentiation through the rest of your model, I either use the cross entropy between the output of your model and the CTC targets (i.e., label probabilities after summing over all the paths that are compatible with the target sequence) or the sum of the element wise product of the gradient with respect to the softmax inputs and the pre-softmax activation of your model. The latter variant is more stable because it skips the softmax gradient and prevents the computation of targets / predictions that can lead to divisions by zeros. For the standard cost you only need the forward pass but for the manual computation of the gradient you need the backward pass as well.

For ease of implementation, I simply computed beta in exactly the same way as alpha (except for some mask related issues). This is not the same as in some formulations of the algorithm where beta(t) doesn't include a multiplication with the local softmax output y_hat(t). This is why in the thesis the likelihood is defined as sum_u(alpha(t, u)beta(t, u)) while in the paper it's sum_u(alpha(t, u)beta(t, u)/y_hat(t, u)). Hopefully this clarifies things a bit.

Cheers

@skaae
Copy link
Author

skaae commented May 31, 2015

Thanks for the reply. I have a few more questions.

"This is not the same as in some formulations of the algorithm where beta(t) doesn't include a multiplication with the local softmax output y_hat(t)."

do you then refer to the different initial states in the book and in the paper. I see that equation 7.26 in the book and eqation 14 in the paper differ by only the division with y^t_{l_s} ?

I dont follow your description on how to use the pseudo_cost for training.

From what you write i should use skip_softmax=True? and then have a linear output from my model?

In the docs for pseudo_cost you write that

    y_hat : tensor3 (T, B, C)
        class probabily distribution sequences, potentially in log domain

Does that mean that y_hat could be in log domain or should be in log domain?

Secondly I have no clue what you mean with the line :)

"...or the sum of the element wise product of the gradient with respect to the softmax inputs and the pre-softmax activation of your model."

Could you give an example?

say i have the following:

model_pre_act = #model_output_including_blanks
model_softmax = softmax(model_pre_act)

How would i then get the gradients for the parameters in the model?

@skaae
Copy link
Author

skaae commented Jun 1, 2015

I tried to write an example using lasagne. Its mostly copied from the ctc_test file.

I try to do what you described here:

...the sum of the element wise product of the gradient with respect to the softmax inputs and the pre-softmax activation of your model.

I'm not sure I correctly understood how to combine the CTC gradients and the gradients from the rest of the network.

import lasagne
from lasagne.layers import RecurrentLayer, InputLayer, DenseLayer,\
    NonlinearityLayer, ReshapeLayer, EmbeddingLayer
import theano
import theano.tensor as T
import numpy as np
num_batch, input_seq_len = 10, 45
num_classes = 10
target_seq_len = 5


Y_hat = np.asarray(np.random.normal(
    0, 1, (input_seq_len, num_batch, num_classes + 1)), dtype=floatX)
Y = np.zeros((target_seq_len, num_batch), dtype='int64')
Y[25:, :] = 1
Y_hat_mask = np.ones((input_seq_len, num_batch), dtype=floatX)
Y_hat_mask[-5:] = 0
# default blank symbol is the highest class index (3 in this case)
Y_mask = np.asarray(np.ones_like(Y), dtype=floatX)
X = np.random.random(
    (num_batch, input_seq_len)).astype('int32')

input_mask = T.matrix('features_mask')
y_hat_mask = input_mask
y = T.lmatrix('phonemes')
y_mask = T.matrix('phonemes_mask')
x = T.imatrix()   # batchsize, input_seq_len

# setup Lasagne Recurrent network
# The output from the network is:
#  a) output_lin_ctc is the activation before softmax  (input_seq_len, batch_size, num_classes + 1)
#  b) ouput_softmax is the output after softmax  (batch_size, input_seq_len, num_classes + 1)
l_inp = InputLayer((num_batch, input_seq_len))
l_emb = EmbeddingLayer(l_inp, input_size=num_classes, output_size=15)
l_rnn = RecurrentLayer(l_emb, num_units=10)
l_rnn_shp = ReshapeLayer(l_rnn, (num_batch*input_seq_len, 10))
l_out = DenseLayer(l_rnn_shp, num_units=num_classes+1,
                   nonlinearity=lasagne.nonlinearities.identity)  # + blank

l_out_shp = ReshapeLayer(l_out, (num_batch, input_seq_len, num_classes+1))

# dimshuffle to shape format (input_seq_len, batch_size, num_classes + 1)
l_out_shp_ctc = lasagne.layers.DimshuffleLayer(l_out_shp, (1, 0, 2))

l_out_softmax = NonlinearityLayer(
    l_out, nonlinearity=lasagne.nonlinearities.softmax)
l_out_softmax_shp = ReshapeLayer(
    l_out_softmax, (num_batch, input_seq_len, num_classes+1))

output_lin_ctc = lasagne.layers.get_output(l_out_shp_ctc, x)
output_softmax = lasagne.layers.get_output(l_out_softmax_shp, x)
all_params = lasagne.layers.get_all_params(l_out_shp)


###############
#  GRADIENTS  #
###############

# the CTC cross entropy between y and linear output network
pseudo_cost = ctc_cost.pseudo_cost(
    y, output_lin_ctc, y_mask, y_hat_mask,
    skip_softmax=True)

# calculate the gradients of the CTC wrt. linar output of network
pseudo_cost_sum = pseudo_cost.sum()
pseudo_cost_grad = T.grad(pseudo_cost_sum, output_lin_ctc)

# multiply CTC gradients with RNN output activation before softmax
output_to_grad = T.sum(pseudo_cost_grad * output_lin_ctc)

# calculate the gradients
all_grads = T.grad(output_to_grad, all_params)

updates = lasagne.updates.rmsprop(all_grads, all_params, learning_rate=0.0001)

train = theano.function([x, y, y_hat_mask, y_mask],
                        [output_lin_ctc, output_softmax, pseudo_cost_sum],
                        updates=updates)

test_val = train(X, Y, Y_hat_mask, Y_mask)
print test_val[0].shape
print test_val[1].shape

# Create test dataset
num_samples = 1000
np.random.seed(1234)

# create simple dataset of format
# input [5,5,5,5,5,2,2,2,2,2,3,3,3,3,3,....,1,1,1,1]
# targets [5,2,3,...,1]
# etc...
input_lst, output_lst = [], []
for i in range(num_samples):
    this_input = []
    this_output = []
    prev_class = -1
    for j in range(target_seq_len):
        this_class = np.random.randint(num_classes)
        while prev_class == this_class:
            this_class = np.random.randint(num_classes)

        prev_class = this_class
        this_class = np.random.randint(num_classes)
        this_len = np.random.randint(1, 10)

        this_input += [this_class]*this_len
        this_output += [this_class]

    this_input += (input_seq_len - len(this_input))*[this_input[-1]]

    input_lst.append(this_input)
    output_lst.append(this_output)

input_arr = np.concatenate([input_lst]).astype('int32')
y_arr = np.concatenate([output_lst]).astype('int64')

y_mask_arr = np.ones((target_seq_len, num_batch), dtype='float32')
input_mask_arr = np.ones((input_seq_len, num_batch), dtype='float32')

for nn in range(200):
    for i in range(num_samples//num_batch):
        idx = range(i*num_batch, (i+1)*num_batch)
        _, _, cost = train(
            input_arr[idx],
            np.transpose(y_arr[idx]),
            input_mask_arr,
            y_mask_arr)
        print cost

@pbrakel
Copy link
Collaborator

pbrakel commented Jun 1, 2015

Hey Søren,

While the pseudo cost is not the same as the CTC cost, it should have the same gradient and already does the multiplication with the outputs internally so you don't have to compute the gradient with respect to the outputs separately and can just treat it as you would with any other cost. You can use the actual CTC cost function for performance monitoring. When you use the skip_softmax option, the function expects the linear activations. I see you implemented this correctly. Internally, it still computes softmax, but it makes sure theano doesn't try to compute its gradient. The skip_softmax variant should be far more reliable because it can deal with very large input values and I'm guessing it might be a bit faster too but I didn't test that.

I'll try to answer your earlier questions when I find more time.

Best,
Philemon

@skaae
Copy link
Author

skaae commented Jun 1, 2015

Thanks. I think i'm getting there.

Changed these lines and printed cost instead of pseudo cost.

    pseudo_cost_grad = T.grad(pseudo_cost.mean(), all_params)
    true_cost = ctc_cost.cost(y, output_softmax.dimshuffle(1, 0, 2), y_mask, y_hat_mask)
    cost = T.mean(true_cost)
    updates = lasagne.updates.rmsprop(pseudo_cost_grad, all_params, learning_rate=0.0001)

The cost seems to go down on my test data.

@Richi91
Copy link

Richi91 commented Jun 27, 2015

Hello Søren,

Did you get the CTC code working with lasagne (recurrent)? Could you share that code? It would save me a lot of time ;-)
In the code snippet above, you use "pseudo_cost", however this function does not exist anymore in the most recent version...
Maybe you could just post the code that was finally working for your example?

Cheers,
Richard

@skaae
Copy link
Author

skaae commented Jun 27, 2015 via email

@Richi91
Copy link

Richi91 commented Jun 27, 2015

Cool, thank you!
I am also going to use it for timit.. I try to reproduce Alex Graves results

@skaae
Copy link
Author

skaae commented Jun 28, 2015

Awesome, I'm very interested in the results. Do you have a script for
creating the input features? I have a simple python script which I think
reproduces the features.

Den 27/06/2015 23.46 skrev "Richi91" [email protected]:

Cool, thank you!
I am also going to use it for timit.. I try to reproduce Alex Graves
results


Reply to this email directly or view it on GitHub.det

@Richi91
Copy link

Richi91 commented Jun 28, 2015

Well, he does not completely specify his preprocessing in his paper. He uses HTK for his "fourier-based filter banks", which is explained here: http://www.ee.columbia.edu/ln/LabROSA/doc/HTKBook21/node54.html
But there are several parameters which are not explained in the paper. For example the frequency range, the analysis window length, the step-size and the window size for calculating the deltas.

For a first try, I am using the complete frequency range from 200-8kHz. For the other parameters I just use the standard values (25ms,10ms, 2).
I have just pushed my pre-process script to my fork of 'craffel/nntools'. I use the package python_speech_features for calculating the filterbank energies.
If you like to discuss about the preprocessing, I suggest we open another "issue" or discuss via email, because this is still the CTC - "issue" ;-)

@skaae
Copy link
Author

skaae commented Jun 29, 2015

I put up the code here: https://github.com/skaae/Lasagne-CTC. Im very interested in your progress :)

@Richi91
Copy link

Richi91 commented Aug 17, 2015

Hello @pbrakel ,

I am still/again working with your CTC code, but I cannot get it working correctly. During training, I get both positive and negative values for the cost. This shouldn't be possible, should it?

Training my net with cross-entropy error (at each timestep) worked fine, thus the problem must be the CTC-cost.
Did you test the CTC-code and verified that it is correct? And do you have an example for how to use it?
I have problems understanding the pseudo_cost code, so I can't tell whether there is something wrong or I just don't get it..

Kind regards

@pbrakel
Copy link
Collaborator

pbrakel commented Aug 17, 2015

Hey @Richi91,

I just wrote an explanation of what pseudo_cost is supposed to do at skaae/Lasagne-CTC#1 (comment) .
What it boils down to is that the pseudo_cost should have the same gradient as CTC but that it will not give the same cost value and can be negative. Ideally I should write a theano op that computes the cost using the cost function and the gradient using the code in pseudo_cost to move the confusing part of the code to a lower level but I haven't gotten around to doing so yet.

If you show me an example of your code I can look at it. Perhaps these couple of lines will be helpful as well (y_hat_o is the output activation before it goes into sofmax):

ctc_cost_t = ctc_cost.pseudo_cost(y, y_hat_o, y_mask, y_hat_mask,
                                                       skip_softmax=True)
ctc_cost_monitor = ctc_cost.cost(y, y_hat, y_mask, y_hat_mask)

@Richi91
Copy link

Richi91 commented Aug 18, 2015

Hey @pbrakel ,

thank you for your answer, it helped me to understand the need for pseudo_cost.
I guess that I have already used it correctly, however I still cannot achieve any good results and cannot tell whether this is due to the CTC implementation or due to other reasons.
For some reason, my network only learns to maximize the probability for blank symbols (>0.8 at almost every timestep). Tonight I will try to pretrain with cross-entropy and fine-tune with ctc and see whether this helps.

Here is a snippet of my code (using lasagne):

#************************************ input *************************************************
l_in = lasagne.layers.InputLayer(shape=(BATCH_SIZE, MAX_INPUT_SEQ_LEN, INPUT_DIM))
l_mask = lasagne.layers.InputLayer(shape=(BATCH_SIZE, MAX_INPUT_SEQ_LEN), 
                                   input_var=theano.tensor.matrix('input_mask', dtype=theano.config.floatX))
#************************************ deep BLSTM ********************************************
blstm0 = BLSTMConcatLayer(incoming=l_in, mask_input=l_mask, 
    num_units=N_LSTM_HIDDEN_UNITS[0], gradient_steps=GRADIENT_STEPS, grad_clipping=GRAD_CLIP)
blstm1 = BLSTMConcatLayer(incoming=blstm0, mask_input=l_mask,
    num_units=N_LSTM_HIDDEN_UNITS[1], gradient_steps=GRADIENT_STEPS, grad_clipping=GRAD_CLIP)
blstm2 = BLSTMConcatLayer(incoming=blstm1, mask_input=l_mask, 
    num_units=N_LSTM_HIDDEN_UNITS[2], gradient_steps=GRADIENT_STEPS, grad_clipping=GRAD_CLIP)

#************************************ fully connected ****************************************                          
l_reshape2 = lasagne.layers.ReshapeLayer(
    blstm2, (BATCH_SIZE*MAX_INPUT_SEQ_LEN, N_LSTM_HIDDEN_UNITS[2]*2))
l_out_lin = lasagne.layers.DenseLayer(
    incoming=l_reshape2, num_units=OUTPUT_DIM, nonlinearity=lasagne.nonlinearities.linear)

#************************************ linear output ******************************************
model_lin = lasagne.layers.ReshapeLayer(
    l_out_lin, (BATCH_SIZE, MAX_INPUT_SEQ_LEN, OUTPUT_DIM))

#************************************ Softmax output *****************************************
l_out_softmax = lasagne.layers.NonlinearityLayer(
    l_out_lin, nonlinearity=lasagne.nonlinearities.softmax)
model_soft = lasagne.layers.ReshapeLayer(
    l_out_softmax, (BATCH_SIZE, MAX_INPUT_SEQ_LEN, OUTPUT_DIM))  


output_lin = lasagne.layers.get_output(model_lin) 
output_softmax = lasagne.layers.get_output(model_soft) 


Y = T.matrix('target', dtype=theano.config.floatX) 
Y_mask = T.matrix('target_mask', dtype=theano.config.floatX) 


all_params = lasagne.layers.get_all_params(model_lin, trainable=True) 

# Lasagne = Batch x Time x Feature_Dim --> swap Batch and Time for CTC
ctc_cost_train = ctc_cost.pseudo_cost(y=Y.dimshuffle((1,0)), \
                       y_hat=output_lin.dimshuffle((1,0,2)), \
                       y_mask=Y_mask.dimshuffle((1,0)), \
                       y_hat_mask=(l_mask.input_var).dimshuffle((1,0)), \
                       skip_softmax=True).mean(dtype=theano.config.floatX)

ctc_cost_monitor = ctc_cost.cost(y=Y.dimshuffle((1,0)), \
                            y_hat=output_softmax.dimshuffle((1,0,2)), \
                            y_mask=Y_mask.dimshuffle((1,0)), \
                            y_hat_mask=(l_mask.input_var).dimshuffle((1,0))).mean(dtype=theano.config.floatX)                                

updates = lasagne.updates.momentum(
    ctc_cost_train, all_params, learning_rate=lasagne.utils.floatX(LEARNING_RATE))  

train = theano.function([l_in.input_var, Y, l_mask.input_var, Y_mask],
                        outputs=[output_softmax, ctc_cost_monitor],
                        updates=updates)

@Michlong
Copy link

@Richi91 hi, I am meeting the same problem as yours... ctc loss is negative and the train results output all blanks. Do you figure out these problems?

@Richi91
Copy link

Richi91 commented Jan 19, 2016

@Michlong hi, sorry for late reply.
The implementation of CTC does work fine. Use pseudo-cost for training and cost for display.
I can't remember what went wrong, possibly I had wrong hyper-parameters.
I would suggest you to use a rel. high momentum and start with a high learning rate. Or use an adaptive learning rate, but start with a high LR. The first initial epochs, the net will mostly output blanks. After a few epochs you should see that the net will produce other outputs aswell. Most will be blanks, though.

Don't forget gradient clipping, especially with high LR ;-)

@raindeer
Copy link

@Richi91 are you willing to share your code?

@Richi91
Copy link

Richi91 commented Jan 20, 2016

I do not longer have an implementation for an RNN with CTC in Lasagne.
But in https://github.com/Richi91/SpeechRecognition/blob/master/blocks/run.py I have done some experiments using blocks.

Actually, all you need to do is use the cost function like this:
cost_train = ctc.pseudo_cost(y, y_hat, y_m, x_m).mean()
cost_monitor = ctc.cost(y, y_hat_softmax, y_m, x_m).mean()

then write a theano function for the train loop with cost_train and a function for validation without updates with cost_monitor.

y: targets (e.g. words or phonemes. This is not frame-wise)
y_hat, y_hat_softmax: network output before and after softmax
y_m: mask for targets
x_m: mask for inputs

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants