Skip to content

Commit

Permalink
Add Leaderboard
Browse files Browse the repository at this point in the history
  • Loading branch information
liushz committed Dec 18, 2024
1 parent 2ae3bb3 commit 930c9ba
Show file tree
Hide file tree
Showing 2 changed files with 31 additions and 5 deletions.
4 changes: 2 additions & 2 deletions docs/LiveMathBench-A.csv
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Qwen2.5-7B-Instruct,32.8,31.7,23.4,13.8,22.2,https://github.com/QwenLM/Qwen,TRUE
Qwen2.5-32B-Instruct,42.9,40.6,34.4,23.9,32.6,https://github.com/QwenLM/Qwen,TRUE,FALSE,FALSE
Qwen2.5-72B-Instruct,43.7,41.7,34.5,25.2,33.2,https://github.com/QwenLM/Qwen,TRUE,FALSE,FALSE
DeepSeek-V2.5-1210,43.3,39.7,28.1,16.6,27.0,https://github.com/deepseek-ai/DeepSeek-LLM,FALSE,FALSE,FALSE
Mistral-Large-Instruct-2411,37.4,29.1,26.7,23.5,26.4,https://example.com/mistral,TRUE,FALSE,FALSE
Mistral-Large-Instruct-2411-123B,37.4,29.1,26.7,23.5,26.4,https://example.com/mistral,TRUE,FALSE,FALSE
Gemini-1.5-Pro-Latest,49.2,48.0,40.2,26.8,37.8,https://example.com/gemini,FALSE,FALSE,FALSE
Claude-3.5-Sonnet,37.0,35.2,27.2,17.4,25.9,https://docs.anthropic.com/claude/docs/models-overview,FALSE,FALSE,FALSE
GPT-4o-2024-11-20,40.0,36.1,28.2,18.4,26.8,https://openai.com/research/gpt-4,FALSE,FALSE,FALSE
Expand All @@ -15,6 +15,6 @@ DeepSeek-Math-7B-RL,20.6,17.9,12.7,5.8,11.7,https://github.com/deepseek-ai/DeepS
NuminaMath-72B-CoT,34.5,22.6,12.8,3.7,11.8,https://example.com/numinamath,TRUE,TRUE,FALSE
Qwen2.5-Math-7B-Instruct,39.9,39.2,32.2,24.2,31.2,https://github.com/QwenLM/Qwen,TRUE,TRUE,FALSE
Qwen2.5-Math-72B-Instruct,50.4,45.3,37.8,26.8,36.5,https://github.com/QwenLM/Qwen,TRUE,TRUE,FALSE
Skywork-o1,39.5,31.2,24.1,13.1,22.6,https://example.com/skywork,TRUE,FALSE,TRUE
Skywork-o1-8B,39.5,31.2,24.1,13.1,22.6,https://example.com/skywork,TRUE,FALSE,TRUE
QwQ-32B-Preview,64.3,66.6,56.2,33.3,52.2,https://example.com/qwq,TRUE,FALSE,TRUE
OpenAI o1-mini,66.5,68.5,58.8,42.0,56.5,https://openai.com/research/o1,FALSE,FALSE,TRUE
32 changes: 29 additions & 3 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,23 @@
width: 200px;
font-weight: 600;
}

.custom-quote {
background-color: #fff8e3; /* 浅黄色背景 */
border-left: 4px solid #ffc107; /* 左侧黄色边框 */
padding: 10px 20px; /* 内边距 */
margin: 10px 0; /* 外边距 */
border-radius: 4px; /* 圆角 */
font-family: Arial, sans-serif; /* 字体 */
}

.custom-quote h5 {
margin: 0; /* 移除默认的 h5 边距 */
}

.custom-quote small {
color: #333; /* 深灰色文字 */
font-size: 16px; /* 文字大小 */
}
.paper-btn-tapestry {
position: relative;
text-align: center;
Expand Down Expand Up @@ -191,7 +207,7 @@
<div class="container">
<div id="content" class="container-fluid d-flex flex-column align-items-center gap-3">
<h1 class="text-nowrap mt-5">🏆 LiveMathBench Leaderboard 🏆</h1>
<h3 class="fw-light text-nowrap"><small id="warning">GPassK: Are Your LLMs Capable of Stable Reasoning? <br></small></h3>
<h2 class="fw-light text-nowrap"><small id="warning">GPassK: Are Your LLMs Capable of Stable Reasoning? <br></small></h2>
<div style="clear: both">
<div class="paper-btn-parent">
<a class="paper-btn" href="https://arxiv.org/abs/2412.13147">
Expand All @@ -203,13 +219,23 @@ <h3 class="fw-light text-nowrap"><small id="warning">GPassK: Are Your LLMs Capab
Code
</a>
</div>

<!-- <div class="toggle-btn-parent">
<button class="toggle-btn" id="toggleButton">
<span class="material-icons"> swap_horiz </span>
Show Theory Scores
</button>
</div> -->
<div class="alert alert-info custom-quote" role="alert">
<h5 class="fw-light text-nowrap">
<small id="warning">
📢 Calling for Evaluation! If you want to see your model on the leaderboard, feel free to <a href="https://github.com/open-compass/GPassK/pulls">contact</a> us!!!
</small>
</h5>
</div>
</div>


<div>
<div id="chart" style="width:100%;height:600px;"></div>
<div class="container-fluid d-flex flex-row flex-nowrap">
Expand All @@ -225,7 +251,7 @@ <h4>📝 Notes</h4>
<li>Models labeled with 🌍 are Closed-source models, while others are Open-sourced. </li>
<li>Models labeled with 🧮 are Mathematics-Specialization models. </li>
<li>Models labeled with 💡 are o1-like models with Long-cot. </li>
<li>Feel free to <a href="https://github.com/open-compass/GPassK/pulls">file a request</a> to add your models on our leaderboard. </li>
<!-- <li>Feel free to <a href="https://github.com/open-compass/GPassK/pulls">file a request</a> to add your models on our leaderboard. </li> -->
</ol>
</p>
</div>
Expand Down

0 comments on commit 930c9ba

Please sign in to comment.