Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

GPQA

19 models

Top 10 Models Performance

Rank Model Score
🥇 google/gemini-3.1-pro-preview 94.3
🥈 anthropic/claude-opus-4.7 94.2
🥉 openai/gpt-5.5 93.6
4 moonshotai/kimi-k2.6 90.5
5 deepseek-ai/deepseek-v4-pro 90.1
6 deepseek-ai/deepseek-v4-flash 88.1
7 inception/mercury-2 74
8 nvidia/nvidia-nemotron-3-nano-30b-a3b 73
9 nvidia/nvidia-nemotron-nano-9b-v2 64
10 qwen/qwen3-8b 59.6
11 nvidia/nvidia-nemotron-3-nano-4b 53.2
12 yandex/gpt-5.1-pro 46
13 yandex/gpt-5-pro 42
14 qwen/qwen3-1.7b-base 28.28
15 qwen/qwen3-0.6b-base 26.77
16 qwen/qwen2.5-0.5b 24.75
17 google/gemma-3-1b-pt 24.75
18 qwen/qwen2.5-1.5b 24.24
19 qwen/qwen3.5-0.8b 11.9