You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where it tries to calculate the token number of each text sample in input_texts, by count the number of token IDs that do not equal to tokenizer.pad_token_id.
However, when we calculate the loss, the number of tokens calculated actually starts from the second token rather the beginning of each inputs as shown in line 173
In __get_ppl() of
PPLInferencer
, at line 186where it tries to calculate the token number of each text sample in
input_texts
, by count the number of token IDs that do not equal totokenizer.pad_token_id
.However, when we calculate the
loss
, the number of tokens calculated actually starts from the second token rather the beginning of eachinputs
as shown in line 173Thus, I think the correct way to calculate the token number for line 186 should be
The new version will have very small difference from the original version, that is,
new_lens = orig_lens - 1
.For reference:
The text was updated successfully, but these errors were encountered: