You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The main feature request involves a New Trainer Subclass, similar to Seq2SeqTrainer, but suitable for Decoder-Only LM.
Motivation
Seq2SeqTrainer provides a great abstraction for Encoder-Decoder LM, when we need to conduct generation during evaluate()
But the current implementation of both Trainer and Seq2SeqTrainer seems to be not suitable for Decoder-Only LM due to the difference of input_ids and labels between teacher-forcing training and generation-involved evaluation.
For example in instruction tuning:
During training (teacher-forcing)
input_ids='Translation the following texts: {Text in Chinese...} {Text in English...}'labels='Translation the following texts: {Text in Chinese...} {Text in English...}'
During evaluation
input_ids='Translation the following texts: {Text in Chinese...}'labels='{Text in English...}'
So we need to prepare two kinds of inputs_ids during evaluation for calculation of both loss and bleu_metrics. It leads to different columns in eval_dataset. However, Trainer._remove_unused_columns() will remove columns for both eval_dataset and train_dataset not accepted by model.forward(). During training, this behaviour is expected (we only need the teacher-forcing inputs). But it will make evaluation difficult.
This feature is nearly identical across all CausalLM models when performing generation during evaluation, making it highly reusable. Given the increasing number of Decoder-only LMs (CausalLMs) in the community, I strongly recommend implementing a dedicated CausalTrainer to simplify deployments.
I may have missed something. If there is already a simpler way to customize such a Trainer, please let me know.
Your contribution
I'm willing to help submit a PR. But I'm not familiar with some integrations such as fsdp and deepspeed. I may need someone to help me finish this feature.
The text was updated successfully, but these errors were encountered:
Looks that way to me - we can add a note to #32346 that it fixes this issue! @skpig is there anything missing in #32346 that you need for your use-cases?
Feature request
The main feature request involves a New Trainer Subclass, similar to Seq2SeqTrainer, but suitable for Decoder-Only LM.
Motivation
Seq2SeqTrainer
provides a great abstraction for Encoder-Decoder LM, when we need to conduct generation duringevaluate()
But the current implementation of both
Trainer
andSeq2SeqTrainer
seems to be not suitable for Decoder-Only LM due to the difference ofinput_ids
andlabels
between teacher-forcing training and generation-involved evaluation.For example in instruction tuning:
So we need to prepare two kinds of inputs_ids during evaluation for calculation of both
loss
andbleu_metrics
. It leads to different columns in eval_dataset. However,Trainer._remove_unused_columns()
will remove columns for botheval_dataset
andtrain_dataset
not accepted bymodel.forward()
. During training, this behaviour is expected (we only need the teacher-forcing inputs). But it will make evaluation difficult.This feature is nearly identical across all CausalLM models when performing generation during evaluation, making it highly reusable. Given the increasing number of Decoder-only LMs (CausalLMs) in the community, I strongly recommend implementing a dedicated CausalTrainer to simplify deployments.
I may have missed something. If there is already a simpler way to customize such a Trainer, please let me know.
Your contribution
I'm willing to help submit a PR. But I'm not familiar with some integrations such as fsdp and deepspeed. I may need someone to help me finish this feature.
The text was updated successfully, but these errors were encountered: