MATH
21 models
Top 10 Models Performance
| yandex/gpt-5.1-pro | ######################################## | 86 |
| yandex/gpt-5-pro | ###################################### | 81 |
| tencent/hy3-preview-base | ################################### | 76.28 |
| deepseek-ai/deepseek-v2.5 | ################################### | 74.7 |
| yandex/gpt-5-lite | ################################# | 71.5 |
| qwen/qwen2.5-coder-32b | ########################### | 57.2 |
| qwen/qwen2.5-coder-14b | ######################### | 52.8 |
| google/gemma-3-27b-pt | ####################### | 50 |
| yandex/gpt-5-lite-pretrain | ####################### | 48.8 |
| qwen/qwen2.5-coder-7b | ###################### | 46.6 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | yandex/gpt-5.1-pro | 86 |
| 🥈 | yandex/gpt-5-pro | 81 |
| 🥉 | tencent/hy3-preview-base | 76.28 |
| 4 | deepseek-ai/deepseek-v2.5 | 74.7 |
| 5 | yandex/gpt-5-lite | 71.5 |
| 6 | qwen/qwen2.5-coder-32b | 57.2 |
| 7 | qwen/qwen2.5-coder-14b | 52.8 |
| 8 | google/gemma-3-27b-pt | 50 |
| 9 | yandex/gpt-5-lite-pretrain | 48.8 |
| 10 | qwen/qwen2.5-coder-7b | 46.6 |
| 11 | qwen/qwen3-1.7b-base | 43.5 |
| 12 | google/gemma-3-12b-pt | 43.3 |
| 13 | qwen/qwen2.5-coder-3b | 40 |
| 14 | qwen/qwen2.5-1.5b | 35 |
| 15 | qwen/qwen3-0.6b-base | 32.44 |
| 16 | qwen/qwen2.5-coder-1.5b | 30.9 |
| 17 | google/gemma-3-4b-pt | 24.2 |
| 18 | qwen/qwen2.5-0.5b | 19.48 |
| 19 | tiiuae/falcon3-mamba-7b-base | 15.6 |
| 20 | qwen/qwen2.5-coder-0.5b | 15.4 |
| 21 | google/gemma-3-1b-pt | 3.66 |