GitHub - modelscope/r-chain

R-Chain: A lightweight toolkit for distilling reasoning models

Inspired by reasoning models like DeepSeek-R1 series, we put together r-chain to systematically reproduce the distillation process of reasoning models like DeepSeek-R1, for various tasks including mathematical reasoning. This effort involves several key steps and outcomes:

Dataset Curation: Curate mathematical distillation datasets, MathR and MathR-32B-Distill, which incorporate reasoning processes. These datasets shall be generated using the DeepSeek-R1 and DeepSeek-R1-Distill-Qwen-32B models, respectively.
Training and Evaluation: Use the curated datasets to distill a smaller dense model, such as Qwen2.5-7B-Instruction, separately. Evaluate the resulting model on reaonsing datasets, to validate the effectiveness of the curated data.
Reasoning response verification: Verify the reasoning content generated by the o1/R1-like models, and filter out the incorrect reasoning content with rule-based and model-based strategies.

MathR and MathR-32B-Distill Dataset Construction

Problem Selection: Utilize publicly available dataset such as NuminaMath-CoT, including problems of different kinds, such as amc_aime, math, gsm8k and others.
Teacher Model Inference: We generate responses from Teacher models such as DeepSeek-R1 and DeepSeek-R1-Distill-Qwen-32B. The instruction prompt "Please reason step by step, and put your final answer within \boxed{}." is employed to guide and solict output from Teacher Model. After obtaining the reasoning_content and content from teacher models, we format them using the template f'<think>{reasoning_content}</think>\n\n<answer>{content}</answer>'. These formatted responses are then assembled into standard messages format, making them ready for direct use in training. All data generated in this step is progressively uploaded to raw subsets of the MathR and MathR-32B-Distill datasets.
Response Filtering: Even with strong Teacher models such as DeepSeek-R1, their responses to challenging math problems may still contain errors. To address this, we employ a rule-based filtering approach to filter the raw datasets. We have implemented different filtering strategies tailored to the various problems in NuminaMath-CoT, depending on the source of the questions. The filtered data is uploaded to the clean subsets of MathR and MathR-32B-Distill datasets.

Tools for Training and Evaluation

r-chain is built upon existing tools such as ms-swift and evalscope for performing supervised fine-tuning and evaluation, respectively.

Supervised Fine-Tuning:

Training can be done with command

bash examples/train_scripts/train_MathR-Distill-7B.sh

The script leverages ms-swift and perform SFT on Qwen2.5-7B-Instruct with MathR and MathR-32B-Distill datasets. By default the training is configured to run on 8 GPUs, you may modify the script for various configurations.

Deployment:

Once the model is trained, you may deploy it to a vllm backend via

bash examples/evaluation_scripts/deploy_MathR-Distill-7B.sh

This facilitates model evaluation later.

Evaluation：

The modle may be evaluated with evalscope with the following script:

python examples/evaluation_scripts/eval_MathR_Distill_7B.py

By default it evaulates on MATH-500 and GPQA-Diamond benchmarks, wiht evaluation metric being Pass@1. Each sample is generated five times and the result is the average of these five attempts.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
common		common
examples		examples
math_distillation		math_distillation
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R-Chain: A lightweight toolkit for distilling reasoning models

MathR and MathR-32B-Distill Dataset Construction

Tools for Training and Evaluation

Supervised Fine-Tuning:

Deployment:

Evaluation：

About

Releases

Packages

Contributors 3

Languages

License

modelscope/r-chain

Folders and files

Latest commit

History

Repository files navigation

R-Chain: A lightweight toolkit for distilling reasoning models

MathR and MathR-32B-Distill Dataset Construction

Tools for Training and Evaluation

Supervised Fine-Tuning:

Deployment:

Evaluation：

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages