Previous control is not used? #8

zhmd · 2020-11-19T22:10:31Z

Hi,

Thanks for sharing this great repo! It's awesome.

I just have a few questions about the control unit:
(1) From

Lines 74 to 77 in 15a65d1

    
           # TODO: add mask again?! 
        
           # question_lengths = torch.cuda.FloatTensor(question_lengths) 
        
           # mask = self.mask(question_lengths, logits.device).unsqueeze(-1) 
        
           # logits += mask

these lines seem to be commented out -- but I think these are used for masking padded tokens in the sequence. Is that intentional? Do I miss something here?
(2) It seems in this control function,

pytorch-mac-network/code/mac.py

Line 51 in 15a65d1

def forward(self, question, context, question_lengths, step):

Previous control is never passed in as an argument, so the previous control unit is not passed to next in the reasoning step. I noticed that in the docstring an argument of control is mentioned but it seems to be removed from the function signature. Also the question_lengths is not used since the mask function call is commented out (previous question). Is that intentional as well?

Thanks a lot!

The text was updated successfully, but these errors were encountered:

tohinz · 2020-11-24T17:25:35Z

Hi, it's been a while since I last worked with this, so I'm going from memory without guarantee that what I'm saying is correct.

Regarding the mask: if I remember correctly using the mask didn't improve the final performance, so I removed it to reduce overhead.

Regarding the control unit: the previous control is never used because the control unit uses different parameters for each step (question = self.control_input_u[step](question) uses different parameters depending on the step value). Regarding question_lengths and mask see my previous remark.

zhmd · 2020-11-24T22:13:12Z

Thanks for your feedback! I actually looked into the codes more carefully in the past few days.

If I may, I guess the reason that masking is not helpful is because you used 1e-30 rather than -1e30 in L48. The point is to make the token positions extremely small so they are equivalent to 0 after softmax. But now the value is just too small to have any impact. In original MAC implementation they choose -1e30.

pytorch-mac-network/code/mac.py

Lines 43 to 48 in 15a65d1

    
           def mask(self, question_lengths, device): 
        
               max_len = question_lengths.max().item() 
        
               mask = torch.arange(max_len, device=device).expand(len(question_lengths), int(max_len)) < question_lengths.unsqueeze(1) 
        
               mask = mask.float() 
        
               ones = torch.ones_like(mask) 
        
               mask = (ones - mask) * (1e-30)

Regarding the control unit, I think original paper may have a variation of non-recurrent version, which is like what you did here. But the one published in the paper actually combines previous control with the current control, before feeding into the next step. So it's actually used.
https://github.com/stanfordnlp/mac-network/blob/f6b665a56ea829364351616edcdc744979fc3b06/mac_cell.py#L142-L146

Anyway, thanks a lot for sharing this awesome implementations and for answering my questions! Greatly appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Previous control is not used? #8

Previous control is not used? #8

zhmd commented Nov 19, 2020

tohinz commented Nov 24, 2020

zhmd commented Nov 24, 2020

Previous control is not used? #8

Previous control is not used? #8

Comments

zhmd commented Nov 19, 2020

tohinz commented Nov 24, 2020

zhmd commented Nov 24, 2020