Minimal GRPO implementation

Goal: Working toy implementation of llama-3.2-3b locally RL training with GRPO. Understanding the algorithm & hyper parameters. Just running everything locally on a single node.

Setup

Create conda env

conda create --name grpo python=3.12 -y
conda activate grpo

Install dependencies

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Play with the source in train.py

python train.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
LICENSE		LICENSE
README.md		README.md
loss.py		loss.py
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Minimal GRPO implementation

Setup

Inspiration

References

About

Releases

Packages

Contributors 2

Languages

License

open-thought/tiny-grpo

Folders and files

Latest commit

History

Repository files navigation

Minimal GRPO implementation

Setup

Inspiration

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages