Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for gradient checkpointing #60

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eric-czech
Copy link
Collaborator

@eric-czech eric-czech commented Jan 11, 2025

This adds support for gradient checkpointing using both the Mosaic Composer Trainer and Hugging Face Trainer interfaces.

Both of those assume that checkpointing is configured at training time, rather than during configuration, so you can see that little changes about configuration other than adding a gradient_checkpointing_stride to control how frequently checkpoints are added to the Mamba blocks.

I went back and forth a little bit on how to validate this functionality, and ultimately landed on counting executions of forward passes (through hooks) as being the cleanest way to do it. Let me know if anybody is aware of other ways to test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant