-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add run_orpo.py
#143
Add run_orpo.py
#143
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Done similarly to the original implementation, in order to better reproduce their results
The docs no longer live here :( |
Yes, I believe that's expected! Anyway, thanks for mentioning, I'll submit a PR to add the documentation for ORPO, since it's missing now 👍🏻 |
Description
This PR adds the
run_orpo.py
Python script to fine-tune LLMs with the "to be released"trl.ORPOTrainer
.Besides that, some changes have been applied in the dataset formatting, to also support DPO/ORPO datasets formatted as
prompt-chosen-rejected
, and adding theorpo
as a task inapply_chat_template
.Additionally, this PR adds the prompt filtering based on the length if provided among the
model_args
similarly to what's done in the official ORPO codebase for consistency when replicating their experiments.Experiments
A raw version of the script has been ran, but more tests are needed, if there's an interesting use case I'm happy to collaborate for the release of
run_orpo.py
as recently done for both Zephyr Gemma #129 and StarChat 2 #135 🤗Mistral-7B-v0.1 fine-tune with
argilla/distilabel-capybara-dpo-7k-binarized
as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k