Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing config params on SFT #31

Merged
merged 3 commits into from
Nov 21, 2023
Merged

Missing config params on SFT #31

merged 3 commits into from
Nov 21, 2023

Conversation

tcapelle
Copy link
Contributor

@tcapelle tcapelle commented Nov 15, 2023

Hi,
Small PR to add the missing warmup and the total number of steps so the training happens correctly.
I am also adding info on the GPU requirements ( 80GB Gpus ). <- this is on the main readme =P

image

The link to the experiment

max_seq_length: 2048
max_steps: -1
max_steps: 272
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think max_steps=-1 because num_train_epochs is used instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but the ConstantLengthDataset doesn't know how many steps it will run, so the scheduler can't setup the warmup cycle correctly

Copy link
Contributor Author

@tcapelle tcapelle Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to try fixing this in trl

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch with the logging steps @tcapelle ! There's an open PR to fix this in TRL here (huggingface/trl#979), so I suggest we keep the YAML configs of the repo unchanged for now

@@ -9,7 +9,7 @@ As described in the Zephyr [technical report](https://huggingface.co/papers/2310
See below for commands to train these models using either DeepSpeed ZeRO-3 or LoRA.

## Full training examples

You will require 8 GPUs (80GB of VRAM) to train the full model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to keep this line in the PR if you don't mind reverting the config changes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha, my bad cause this is specified on the main readme file =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for iterating!

@@ -9,7 +9,7 @@ As described in the Zephyr [technical report](https://huggingface.co/papers/2310
See below for commands to train these models using either DeepSpeed ZeRO-3 or LoRA.

## Full training examples

You will require 8 GPUs (80GB of VRAM) to train the full model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for iterating!

@lewtun lewtun merged commit f025057 into huggingface:main Nov 21, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants