Skip to content

open-thought/tiny-grpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minimal GRPO implementation

Goal: Working toy implementation of llama-3.2-3b locally RL training with GRPO. Understanding the algorithm & hyper parameters. Just running everything locally on a single node.

Setup

  1. Create conda env
conda create --name grpo python=3.12 -y
conda activate grpo
  1. Install dependencies
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
  1. Play with the source in train.py
python train.py

Inspiration

References

About

Minimal hackable GRPO implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages