Skip to content

BigCodeBench v0.2.1.post3

Compare
Choose a tag to compare
@terryyz terryyz released this 10 Nov 08:49
· 52 commits to main since this release

What's Changed

  • Fix calibration setting in the code evaluation.
  • Add --no_execute argument for code evaluation.
  • Support concurrent API inference for o1 and deepseek-chat.
  • Fix API inference for Google Gemini.
  • Add --instruction_prefix and --response_prefix arguments for code generation.
  • Change --id_range input type.
  • Add --revision arguments for code generation.

Evaluated LLMs (144 models)

  • Qwen2.5-Coder-32B-Instruct
  • grok-beta
  • claude-3-5-haiku-20241022

Full Changelog: v0.2.0...v0.2.1.post2