You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Congratulations on the impressive work with R1-Onevision! Enhancing visual reasoning in MLLMs is certainly a key step toward improving their capabilities, and your progress in this area is commendable.
I would like to suggest expanding the evaluation of R1-Onevision to the HumanEval-V benchmark. This benchmark provides a more challenging set of tasks by introducing complex diagrams paired with coding challenges. Unlike traditional visual reasoning tasks that focus on answering multiple-choice questions or providing short answers, HumanEval-V requires models to generate code based on visual input, which better tests both instruction-following and open-ended generation abilities.
Key points for consideration:
HumanEval-V expands the reasoning scenarios with complex diagrams, pushing the limits of visual understanding.
The task format is tailored to code generation, making it a suitable benchmark for testing MLLMs’ ability to handle more structured, generative tasks.
Evaluating R1-Onevision on this benchmark will provide valuable insights into how well it handles visual reasoning combined with coding.
Congratulations on the impressive work with R1-Onevision! Enhancing visual reasoning in MLLMs is certainly a key step toward improving their capabilities, and your progress in this area is commendable.
I would like to suggest expanding the evaluation of R1-Onevision to the HumanEval-V benchmark. This benchmark provides a more challenging set of tasks by introducing complex diagrams paired with coding challenges. Unlike traditional visual reasoning tasks that focus on answering multiple-choice questions or providing short answers, HumanEval-V requires models to generate code based on visual input, which better tests both instruction-following and open-ended generation abilities.
Key points for consideration:
You can find more information about the benchmark here: HumanEval-V Homepage.
The text was updated successfully, but these errors were encountered: