How do we verify the generated proof? #74

agoyang · 2025-02-24T07:44:53Z

Thanks for the excellent work. Can you share how do you verify the correctness of the generated proof in the training set? For QA problem, I understand we can just see if the result is correct by exactly match, but I do not know how can we verify the proof since LLMs always give us a correct final result whether the proof is correct or not.

By the way, I will be greatly appreciated if you can also share how many s1-prob questions are selected for the final s1K training set? I try to find it in the filter.ipynb, but I only find a type called stats_qual.

wy96f · 2025-02-24T09:19:47Z

Thanks for the excellent work. Can you share how do you verify the correctness of the generated proof in the training set? For QA problem, I understand we can just see if the result is correct by exactly match, but I do not know how can we verify the proof since LLMs always give us a correct final result whether the proof is correct or not.

By the way, I will be greatly appreciated if you can also share how many s1-prob questions are selected for the final s1K training set? I try to find it in the filter.ipynb, but I only find a type called stats_qual.

s1/data/featurization.py

Line 86 in 6fe78bd

def do_grading(response_dir: str = "Qwen_Qwen2_5_32B_Instruct"):

You can refer "B.3. s1K grading prompt" in the paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we verify the generated proof? #74

How do we verify the generated proof? #74

agoyang commented Feb 24, 2025

wy96f commented Feb 24, 2025 •

edited

Loading

How do we verify the generated proof? #74

How do we verify the generated proof? #74

Comments

agoyang commented Feb 24, 2025

wy96f commented Feb 24, 2025 • edited Loading

wy96f commented Feb 24, 2025 •

edited

Loading