Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

MMLU

23 models

Top 10 Models Performance

openai/gpt-5 ######################################## 92.5
openai/gpt-5.5 ######################################## 92.4
deepseek-ai/deepseek-r1 #################################### 84
qwen/qwen2.5-coder-32b ################################## 79.1
qwen/qwen2.5-coder-14b ################################# 75.2
openai/gpt-3.5-turbo ############################## 70
qwen/qwen2.5-coder-7b ############################# 68
qwen/qwen3-1.7b-base ########################### 62.63
qwen/qwen2.5-coder-3b ########################## 61.2
qwen/qwen2.5-1.5b ########################## 60.9
Rank Model Score
🥇 openai/gpt-5 92.5
🥈 openai/gpt-5.5 92.4
🥉 deepseek-ai/deepseek-r1 84
4 qwen/qwen2.5-coder-32b 79.1
5 qwen/qwen2.5-coder-14b 75.2
6 openai/gpt-3.5-turbo 70
7 qwen/qwen2.5-coder-7b 68
8 qwen/qwen3-1.7b-base 62.63
9 qwen/qwen2.5-coder-3b 61.2
10 qwen/qwen2.5-1.5b 60.9
11 qwen/qwen2.5-coder-1.5b 53.6
12 qwen/qwen3-0.6b-base 52.81
13 qwen/qwen2.5-0.5b 47.5
14 qwen/qwen2.5-coder-0.5b 42
15 huggingfacetb/smollm2-360m-instruct 35.8
16 huggingfacetb/smollm-360m-instruct 34.4
17 openai-community/gpt2 32.4
18 huggingfacetb/smollm2-135m 31.5
19 huggingfacetb/smollm2-135m-instruct 29.3
20 google/gemma-3-1b-pt 26.26
21 amd/amd-llama-135m 23.02
22 raincandy-u/rain-100m 9.06
23 sapbot/toyllama-50m 0.01