diff --git a/README.md b/README.md index c9d99a9..3a76f51 100644 --- a/README.md +++ b/README.md @@ -9,13 +9,14 @@
- +
## 🚀 News +- **[2024.12.30]** 🔥 **[LiveMathBench](https://huggingface.co/datasets/opencompass/LiveMathBench)** now can be assessed through hugginface, and you can also evaluate LLMs on itt using G-Pass@k in OpenCompass. We fix some potential errors in LiveMathBench and sampling parameters inconsistency, please also check our new version of **[Paper]()**. - **[2024.12.18]** We release the **[ArXiv Paper](http://arxiv.org/abs/2412.13147)** of G-Pass@k. 🎉🎉🎉 @@ -42,7 +43,7 @@ Intuitively, $\text{mG-Pass@}k$ provides an interpolated estimate of the area un *LiveMathBench-202412 version*
- +
diff --git a/assets/pass-at-k-v-s-greedy-g-pass-at-k.png b/assets/pass-at-k-v-s-greedy-g-pass-at-k.png new file mode 100644 index 0000000..a0acdc9 Binary files /dev/null and b/assets/pass-at-k-v-s-greedy-g-pass-at-k.png differ diff --git a/assets/performance.png b/assets/performance.png new file mode 100644 index 0000000..06b0914 Binary files /dev/null and b/assets/performance.png differ