Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

MultiPL-E

7 models

Top 10 Models Performance

qwen/qwen3-8b-base ######################################## 52.4
essentialai/rnj-1 ###################################### 50.3
qwen/qwen3-1.7b-base ################################# 42.71
qwen/qwen2.5-1.5b ######################### 33.1
qwen/qwen3-0.6b-base ################### 24.58
qwen/qwen2.5-0.5b ############## 18.7
google/gemma-3-1b-pt #### 5.15
69K – 862.0B
2019 – 2026
Rank Model Score
🥇 qwen/qwen3-8b-base 52.4
🥈 essentialai/rnj-1 50.3
🥉 qwen/qwen3-1.7b-base 42.71
4 qwen/qwen2.5-1.5b 33.1
5 qwen/qwen3-0.6b-base 24.58
6 qwen/qwen2.5-0.5b 18.7
7 google/gemma-3-1b-pt 5.15