Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the function class_batch_to_labeling_batch(y, y_hat, y_hat_mask=None) mean in ctc_cost.py? #6

Open
star013 opened this issue Oct 15, 2015 · 1 comment

Comments

@star013
Copy link

star013 commented Oct 15, 2015

Hello, I am doing some research on TIMIT and I have to use CTC in my model. I read ctc_cost.py but I can not understand the function: class_batch_to_labeling_batch(y, y_hat, y_hat_mask=None).
In comments, y_hat is T x B x (C+1) matrix and y_hat_mask is T x B matrix. In line 65:
y_hat = y_hat * y_hat_mask.dimshuffle(0, 'x', 1)
I am puzzled because y_hat_mask.dimshuffle(0, 'x', 1) is T x 1 x B matrix and it can not multiply with y_hat which is T x B x (C+1) matrix. In addition, I tried to run this function in Ipython notebook and it reported an error.
Could you please explain why it is y_hat = y_hat * y_hat_mask.dimshuffle(0, 'x', 1) and what is res in the function?
Thanks.

@mpezeshki
Copy link
Owner

Hi,
Let's say we have 3 classes (a, b, and c), batch-size of 1, and 2 time-steps. So the probabilities are:

time |   a   |   b   |   c   |   blank    |
-------------------------------------------
  0  |  0.2  |  0.4  |  0.3  |    0.1     |
-------------------------------------------
  1  |  0.35 |  0.15 |  0.2  |    0.3     |
-------------------------------------------

Let's suppose the output sequence is b, a, a, b, c. (Actually blanks will be added too but at this point let's just ignore it.) Then the output must be:

time |   b   |   a   |   a   |   b    |   c     
----------------------------------------------
  0  |  0.4  |  0.2  |  0.2  |  0.4   |  0.3  
----------------------------------------------
  1  |  0.15 |  0.35 |  0.35 |  0.15  |  0.2
----------------------------------------------

So basically it replicates those probabilities to the length of the output sequence.
About the mask, I don't remember why did I do this, but now Kyle has a new version of code which you may find useful:
https://gist.github.com/kastnerkyle/ca851e39229551208c0d#file-minibatch_ocr-py-L175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants