You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Team,
I was using the bigcode-evaluation-harness to evaluate generation for go on Multiple-E dataset and found that, all the evaluation had output ? command-line-arguments [no test files] although status_code = 0.
On debugging further, it looks like we set self.language here instead of prompt_name['langugage'] in the problem dict to process execution downstream, and when language is checked in evaluators here, it is appended without _test.go suffix leading to non detecting any test files.
To make it easy to repro this, I have added a video below which evaluate one go generation test case (used deepseek coder to generate this)
generations_go_example.json
[
[
"package strlen_test\n\nimport (\n \"testing\"\n \"fmt\"\n)\n\n// Return length of given string\n// >>> strlen(\"\")\n// 0\n// >>> strlen(\"abc\")\n// 3\nfunc strlen(myString string) int {\n return len(myString)\n}\n"
]
]
bigcode_go_test_file_name_issue.mp4
The text was updated successfully, but these errors were encountered:
Hi Team,
I was using the bigcode-evaluation-harness to evaluate generation for go on Multiple-E dataset and found that, all the evaluation had output
? command-line-arguments [no test files]
althoughstatus_code = 0
.On debugging further, it looks like we set self.language here instead of
prompt_name['langugage']
in the problem dict to process execution downstream, and when language is checked in evaluators here, it is appended without_test.go
suffix leading to non detecting any test files.To make it easy to repro this, I have added a video below which evaluate one go generation test case (used deepseek coder to generate this)
generations_go_example.json
bigcode_go_test_file_name_issue.mp4
The text was updated successfully, but these errors were encountered: