You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the excellent work. Can you share how do you verify the correctness of the generated proof in the training set? For QA problem, I understand we can just see if the result is correct by exactly match, but I do not know how can we verify the proof since LLMs always give us a correct final result whether the proof is correct or not.
By the way, I will be greatly appreciated if you can also share how many s1-prob questions are selected for the final s1K training set? I try to find it in the filter.ipynb, but I only find a type called stats_qual.
The text was updated successfully, but these errors were encountered:
Thanks for the excellent work. Can you share how do you verify the correctness of the generated proof in the training set? For QA problem, I understand we can just see if the result is correct by exactly match, but I do not know how can we verify the proof since LLMs always give us a correct final result whether the proof is correct or not.
By the way, I will be greatly appreciated if you can also share how many s1-prob questions are selected for the final s1K training set? I try to find it in the filter.ipynb, but I only find a type called stats_qual.
Thanks for the excellent work. Can you share how do you verify the correctness of the generated proof in the training set? For QA problem, I understand we can just see if the result is correct by exactly match, but I do not know how can we verify the proof since LLMs always give us a correct final result whether the proof is correct or not.
By the way, I will be greatly appreciated if you can also share how many s1-prob questions are selected for the final s1K training set? I try to find it in the filter.ipynb, but I only find a type called stats_qual.
The text was updated successfully, but these errors were encountered: