Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previous control is not used? #8

Open
zhmd opened this issue Nov 19, 2020 · 2 comments
Open

Previous control is not used? #8

zhmd opened this issue Nov 19, 2020 · 2 comments

Comments

@zhmd
Copy link

zhmd commented Nov 19, 2020

Hi,

Thanks for sharing this great repo! It's awesome.

I just have a few questions about the control unit:
(1) From

# TODO: add mask again?!
# question_lengths = torch.cuda.FloatTensor(question_lengths)
# mask = self.mask(question_lengths, logits.device).unsqueeze(-1)
# logits += mask

these lines seem to be commented out -- but I think these are used for masking padded tokens in the sequence. Is that intentional? Do I miss something here?
(2) It seems in this control function,
def forward(self, question, context, question_lengths, step):

Previous control is never passed in as an argument, so the previous control unit is not passed to next in the reasoning step. I noticed that in the docstring an argument of control is mentioned but it seems to be removed from the function signature. Also the question_lengths is not used since the mask function call is commented out (previous question). Is that intentional as well?

Thanks a lot!

@tohinz
Copy link
Owner

tohinz commented Nov 24, 2020

Hi, it's been a while since I last worked with this, so I'm going from memory without guarantee that what I'm saying is correct.

Regarding the mask: if I remember correctly using the mask didn't improve the final performance, so I removed it to reduce overhead.

Regarding the control unit: the previous control is never used because the control unit uses different parameters for each step (question = self.control_input_u[step](question) uses different parameters depending on the step value). Regarding question_lengths and mask see my previous remark.

@zhmd
Copy link
Author

zhmd commented Nov 24, 2020

Thanks for your feedback! I actually looked into the codes more carefully in the past few days.

If I may, I guess the reason that masking is not helpful is because you used 1e-30 rather than -1e30 in L48. The point is to make the token positions extremely small so they are equivalent to 0 after softmax. But now the value is just too small to have any impact. In original MAC implementation they choose -1e30.

def mask(self, question_lengths, device):
max_len = question_lengths.max().item()
mask = torch.arange(max_len, device=device).expand(len(question_lengths), int(max_len)) < question_lengths.unsqueeze(1)
mask = mask.float()
ones = torch.ones_like(mask)
mask = (ones - mask) * (1e-30)

Regarding the control unit, I think original paper may have a variation of non-recurrent version, which is like what you did here. But the one published in the paper actually combines previous control with the current control, before feeding into the next step. So it's actually used.
https://github.com/stanfordnlp/mac-network/blob/f6b665a56ea829364351616edcdc744979fc3b06/mac_cell.py#L142-L146

Anyway, thanks a lot for sharing this awesome implementations and for answering my questions! Greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants