Add support for gradient checkpointing #60

eric-czech · 2025-01-11T21:20:16Z

This adds support for gradient checkpointing using both the Mosaic Composer Trainer and Hugging Face Trainer interfaces.

Both of those assume that checkpointing is configured at training time, rather than during configuration, so you can see that little changes about configuration other than adding a gradient_checkpointing_stride to control how frequently checkpoints are added to the Mamba blocks.

I went back and forth a little bit on how to validate this functionality, and ultimately landed on counting executions of forward passes (through hooks) as being the cleanest way to do it. Let me know if anybody is aware of other ways to test it.

Add support for gradient checkpointing

3683995

eric-czech force-pushed the main branch from c54fee2 to 3683995 Compare January 11, 2025 21:21

eric-czech mentioned this pull request Jan 13, 2025

Fix rcps test and mamba imports #65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for gradient checkpointing #60

Add support for gradient checkpointing #60

eric-czech commented Jan 11, 2025 •

edited

Loading

Add support for gradient checkpointing #60

Are you sure you want to change the base?

Add support for gradient checkpointing #60

Conversation

eric-czech commented Jan 11, 2025 • edited Loading

eric-czech commented Jan 11, 2025 •

edited

Loading