Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add run_orpo.py #143

Merged
merged 35 commits into from
Apr 11, 2024
Merged

Add run_orpo.py #143

merged 35 commits into from
Apr 11, 2024

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Mar 26, 2024

Description

This PR adds the run_orpo.py Python script to fine-tune LLMs with the "to be released" trl.ORPOTrainer.

Besides that, some changes have been applied in the dataset formatting, to also support DPO/ORPO datasets formatted as prompt-chosen-rejected, and adding the orpo as a task in apply_chat_template.

Additionally, this PR adds the prompt filtering based on the length if provided among the model_args similarly to what's done in the official ORPO codebase for consistency when replicating their experiments.

Experiments

A raw version of the script has been ran, but more tests are needed, if there's an interesting use case I'm happy to collaborate for the release of run_orpo.py as recently done for both Zephyr Gemma #129 and StarChat 2 #135 🤗

Mistral-7B-v0.1 fine-tune with argilla/distilabel-capybara-dpo-7k-binarized as in https://huggingface.co/kaist-ai/mistral-orpo-capybara-7k

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes 4 scripts/run_orpo.py recipes/mistral-capybara/orpo/config_full.yaml

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@alvarobartt alvarobartt marked this pull request as ready for review March 27, 2024 12:48
@alvarobartt alvarobartt changed the title Add run_orpo.py (WIP) Add run_orpo.py Apr 10, 2024
@lewtun lewtun merged commit 70769f9 into huggingface:main Apr 11, 2024
3 checks passed
@nisten
Copy link

nisten commented Apr 12, 2024

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

The docs no longer live here :(

@alvarobartt
Copy link
Member Author

The docs no longer live here :(

Yes, I believe that's expected! Anyway, thanks for mentioning, I'll submit a PR to add the documentation for ORPO, since it's missing now 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants