Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to Evaluate on the HumanEval-V Benchmark for Enhanced Visual Reasoning and Code Generation #135

Open
zfj1998 opened this issue Feb 25, 2025 · 0 comments

Comments

@zfj1998
Copy link

zfj1998 commented Feb 25, 2025

Congratulations on the impressive work with R1-Onevision! Enhancing visual reasoning in MLLMs is certainly a key step toward improving their capabilities, and your progress in this area is commendable.

I would like to suggest expanding the evaluation of R1-Onevision to the HumanEval-V benchmark. This benchmark provides a more challenging set of tasks by introducing complex diagrams paired with coding challenges. Unlike traditional visual reasoning tasks that focus on answering multiple-choice questions or providing short answers, HumanEval-V requires models to generate code based on visual input, which better tests both instruction-following and open-ended generation abilities.

Key points for consideration:

  • HumanEval-V expands the reasoning scenarios with complex diagrams, pushing the limits of visual understanding.
  • The task format is tailored to code generation, making it a suitable benchmark for testing MLLMs’ ability to handle more structured, generative tasks.
  • Evaluating R1-Onevision on this benchmark will provide valuable insights into how well it handles visual reasoning combined with coding.

You can find more information about the benchmark here: HumanEval-V Homepage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant