-
https://opencompass.org.cn/dataset-detail/HellaSwag https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard if we look at the hellaswag in Open LLM leaderboard and opencompass, llama-65b and llama-30b results are different |
Beta Was this translation helpful? Give feedback.
Answered by
kirliavc
Oct 10, 2023
Replies: 1 comment
-
The problem is probably due to fewshot inference. The llm leaderboard used 10-shot hellaswag, but the opencompass used 0-shot. Fewshot request is only implemented in mmlu dataset. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
tonysy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The problem is probably due to fewshot inference. The llm leaderboard used 10-shot hellaswag, but the opencompass used 0-shot. Fewshot request is only implemented in mmlu dataset.