A Trainer subclass for Decoder-Only LM with generation in evaluate() #33396

skpig · 2024-09-10T03:02:17Z

Feature request

The main feature request involves a New Trainer Subclass, similar to Seq2SeqTrainer, but suitable for Decoder-Only LM.

Motivation

Seq2SeqTrainer provides a great abstraction for Encoder-Decoder LM, when we need to conduct generation during evaluate()

But the current implementation of both Trainer and Seq2SeqTrainer seems to be not suitable for Decoder-Only LM due to the difference of input_ids and labels between teacher-forcing training and generation-involved evaluation.

For example in instruction tuning:

During training (teacher-forcing)

input_ids = 'Translation the following texts: {Text in Chinese...} {Text in English...}'
labels = 'Translation the following texts: {Text in Chinese...} {Text in English...}'

During evaluation

input_ids = 'Translation the following texts: {Text in Chinese...}'
labels = '{Text in English...}'

So we need to prepare two kinds of inputs_ids during evaluation for calculation of both loss and bleu_metrics. It leads to different columns in eval_dataset. However, Trainer._remove_unused_columns() will remove columns for both eval_dataset and train_dataset not accepted by model.forward(). During training, this behaviour is expected (we only need the teacher-forcing inputs). But it will make evaluation difficult.

This feature is nearly identical across all CausalLM models when performing generation during evaluation, making it highly reusable. Given the increasing number of Decoder-only LMs (CausalLMs) in the community, I strongly recommend implementing a dedicated CausalTrainer to simplify deployments.

I may have missed something. If there is already a simpler way to customize such a Trainer, please let me know.

Your contribution

I'm willing to help submit a PR. But I'm not familiar with some integrations such as fsdp and deepspeed. I may need someone to help me finish this feature.

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-09-10T08:08:32Z

cc @muellerzr regarding the Trainer question, but maybe @Rocketknight1 can shed light on the NLP/training aspect

zucchini-nlp · 2024-09-10T09:11:55Z

Guess the feature requested is same as in #32346

Rocketknight1 · 2024-09-10T14:02:30Z

Looks that way to me - we can add a note to #32346 that it fixes this issue! @skpig is there anything missing in #32346 that you need for your use-cases?

skpig · 2024-09-11T09:54:32Z

Fantastic, that's exactly what I need! Look forward to the new feature! 😊

skpig added the Feature request Request for a new feature label Sep 10, 2024

LysandreJik added the trainer label Sep 10, 2024

zucchini-nlp linked a pull request Sep 11, 2024 that will close this issue

Trainer: add predict with generate #32346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Trainer subclass for Decoder-Only LM with generation in evaluate() #33396

A Trainer subclass for Decoder-Only LM with generation in evaluate() #33396

skpig commented Sep 10, 2024

LysandreJik commented Sep 10, 2024

zucchini-nlp commented Sep 10, 2024

Rocketknight1 commented Sep 10, 2024

skpig commented Sep 11, 2024

A Trainer subclass for Decoder-Only LM with generation in evaluate() #33396

A Trainer subclass for Decoder-Only LM with generation in evaluate() #33396

Comments

skpig commented Sep 10, 2024

Feature request

Motivation

Your contribution

LysandreJik commented Sep 10, 2024

zucchini-nlp commented Sep 10, 2024

Rocketknight1 commented Sep 10, 2024

skpig commented Sep 11, 2024