-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add multiple annotators to Omni-MATH and rename shared modules #3291
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,7 +19,11 @@ def evaluate_generation( | |
eval_cache_path: str, | ||
) -> List[Stat]: | ||
assert request_state.annotations | ||
score = request_state.annotations["omni_math"]["equivalence_judgement"].strip().upper() == "TRUE" | ||
all_judgements = request_state.annotations["omni_math"]["equivalence_judgement"] | ||
if len(all_judgements) == 0: | ||
raise ValueError("Could not compute Omni-MATH accuracy because all annotators failed.") | ||
judgement_bools = [judgement.strip().upper() == "TRUE" for judgement in all_judgements] | ||
score = sum(judgement_bools) / len(judgement_bools) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not valid to sum an array of bools, right? You need to cast them to int first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think bool is a subclass of int in Python, so it actually works fine..? If that introduces too much ambiguity I can go with explicit casting for sure.
|
||
return [ | ||
Stat(MetricName("omni_math_accuracy")).add(score), | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't skip.