Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

MATH

21 models

Top 10 Models Performance

yandex/gpt-5.1-pro ######################################## 86
yandex/gpt-5-pro ###################################### 81
tencent/hy3-preview-base ################################### 76.28
deepseek-ai/deepseek-v2.5 ################################### 74.7
yandex/gpt-5-lite ################################# 71.5
qwen/qwen2.5-coder-32b ########################### 57.2
qwen/qwen2.5-coder-14b ######################### 52.8
google/gemma-3-27b-pt ####################### 50
yandex/gpt-5-lite-pretrain ####################### 48.8
qwen/qwen2.5-coder-7b ###################### 46.6
68.8K – 862.0B
2.0K – 2.0K
Rank Model Score
🥇 yandex/gpt-5.1-pro 86
🥈 yandex/gpt-5-pro 81
🥉 tencent/hy3-preview-base 76.28
4 deepseek-ai/deepseek-v2.5 74.7
5 yandex/gpt-5-lite 71.5
6 qwen/qwen2.5-coder-32b 57.2
7 qwen/qwen2.5-coder-14b 52.8
8 google/gemma-3-27b-pt 50
9 yandex/gpt-5-lite-pretrain 48.8
10 qwen/qwen2.5-coder-7b 46.6
11 qwen/qwen3-1.7b-base 43.5
12 google/gemma-3-12b-pt 43.3
13 qwen/qwen2.5-coder-3b 40
14 qwen/qwen2.5-1.5b 35
15 qwen/qwen3-0.6b-base 32.44
16 qwen/qwen2.5-coder-1.5b 30.9
17 google/gemma-3-4b-pt 24.2
18 qwen/qwen2.5-0.5b 19.48
19 tiiuae/falcon3-mamba-7b-base 15.6
20 qwen/qwen2.5-coder-0.5b 15.4
21 google/gemma-3-1b-pt 3.66