-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing config params on SFT #31
Conversation
max_seq_length: 2048 | ||
max_steps: -1 | ||
max_steps: 272 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think max_steps=-1
because num_train_epochs
is used instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but the ConstantLengthDataset doesn't know how many steps it will run, so the scheduler can't setup the warmup cycle correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to try fixing this in trl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch with the logging steps @tcapelle ! There's an open PR to fix this in TRL here (huggingface/trl#979), so I suggest we keep the YAML configs of the repo unchanged for now
@@ -9,7 +9,7 @@ As described in the Zephyr [technical report](https://huggingface.co/papers/2310 | |||
See below for commands to train these models using either DeepSpeed ZeRO-3 or LoRA. | |||
|
|||
## Full training examples | |||
|
|||
You will require 8 GPUs (80GB of VRAM) to train the full model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to keep this line in the PR if you don't mind reverting the config changes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha, my bad cause this is specified on the main readme file =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating!
This reverts commit 760e477.
@@ -9,7 +9,7 @@ As described in the Zephyr [technical report](https://huggingface.co/papers/2310 | |||
See below for commands to train these models using either DeepSpeed ZeRO-3 or LoRA. | |||
|
|||
## Full training examples | |||
|
|||
You will require 8 GPUs (80GB of VRAM) to train the full model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating!
Hi,
Small PR to add the missing warmup and the total number of steps so the training happens correctly.
I am also adding info on the GPU requirements ( 80GB Gpus ). <- this is on the main readme =P
The link to the experiment