Contributing to Reasoning Gym

Thank you for your interest in contributing to Reasoning Gym! This document provides guidelines and instructions for contributing to the project.

Development Setup

Clone the repository:

git clone https://github.com/open-thought/reasoning-gym.git

Create a virtual environment (using conda):

conda create --name reasoning_gym python=3.11 -y
conda activate reasoning_gym

When creating new datasets, please follow these guidelines:

Focus on Complex Problems:
- Prioritize problems where guessing has a low probability of success (e.g., number multiplication)
- Avoid tasks with small answer sets (true/false, multiple-choice) as they create noisy rewards for RL
Implementation Requirements:
- Create a configuration class
- Derive your dataset class from ProceduralDataset (see dataset.py)
- Include comprehensive unit tests
- Return dictionary items with keys: "question", "answer", and "metadata"
- For datasets with multiple correct answers, override the score_answer() method (return value range: [0, 1])
Getting Started:
- Review an example implementation:
  - Configuration & dataset class: chain_sum.py
  - Unit tests: test_chain_sum.py
- Write clear question prompts that an average human can understand and answer correctly

Fork and Clone:
- Fork the repository
- Clone your fork locally
- Read more about forks
Create a Feature Branch:
- Work on a new branch
- Keep changes focused and minimal
Code Quality:
- Install pre-commit hooks: pre-commit install
- Run pre-commit run -a before committing
- When using AI coding assistants (cursor, aider, etc.), ensure proper formatting
Submit Your PR:
- Create a Pull Request
- Request review
- Do not include changes to GALLERY.md (it's updated automatically)
- (Optional, but desirable) If you have an OpenRouter API key, please try running DeepSeek R1 against 5-10 samples from your dataset to make sure there are no unexpected issues with your dataset.
  1. Update the configuration file eval/r1/yaml/test.yaml with your dataset:
```
# test.yaml
model: deepseek/deepseek-r1
category: test
datasets:
- {YOUR_DATASET_NAME}
eval_dir: eval/r1
dataset_size: 10
dataset_seed: 42
developer_role: system
```
  2. Run the evaluation script:
```
python eval/r1/eval.py --yaml "eval/r1/yaml/test.yaml"
```
  3. Review the results in eval/r1/test/{YOUR_DATASET_NAME}.json and make sure there are no unexpected issues with the dataset generation, model's instruction following, or the scoring function.
  4. Include the results in your PR description.
Review Process:
- Address reviewer feedback promptly
- Keep discussions constructive
- Once approved, your changes will be merged into main

Join our community discussion in the #reasoning-gym channel on the GPU-Mode Discord server.