Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

Artificial Analysis Agentic Index (Maximum Reasoning)

31 models

Top 10 Models Performance

anthropic/claude-opus-4.8 ######################################## 77.8
openai/gpt-5.5 ###################################### 74.1
anthropic/claude-opus-4.7 ##################################### 71.3
google/gemini-3.5-flash #################################### 70.3
openai/gpt-5.4 ################################### 68
xiaomi/mimo-v2.5-pro ################################### 67.4
deepseek-ai/deepseek-v4-pro ################################### 67.2
zai-org/glm-5.1 ################################## 67.1
qwen/qwen3.7-max ################################## 66.6
moonshotai/kimi-k2.6 ################################## 66
68.8K – 862.0B
Rank Model Score
🥇 anthropic/claude-opus-4.8 77.8
🥈 openai/gpt-5.5 74.1
🥉 anthropic/claude-opus-4.7 71.3
4 google/gemini-3.5-flash 70.3
5 openai/gpt-5.4 68
6 xiaomi/mimo-v2.5-pro 67.4
7 deepseek-ai/deepseek-v4-pro 67.2
8 zai-org/glm-5.1 67.1
9 qwen/qwen3.7-max 66.6
10 moonshotai/kimi-k2.6 66
11 x-ai/grok-4.3 65.9
12 qwen/qwen3.6-max 64.8
13 anthropic/claude-sonnet-4.6 63
14 meta/muse-spark 62
15 minimaxai/minimax-m2.7 61.5
16 openai/gpt-5.3-codex 60.5
17 openai/gpt-5.2 60.2
18 openai/gpt-5.2-codex 56.5
19 mistralai/mistral-medium-3.5-128b 53.2
20 x-ai/grok-4.1-fast 49.3
21 x-ai/grok-4-fast 39.5
22 x-ai/grok-code-fast-1 35.6
23 google/gemma-4-26b-a4b-it 32.1
24 openai/o1 31.1
25 nousresearch/hermes-4-405b 12.6
26 nousresearch/hermes-4-70b 11.7
27 meta-llama/llama-3.3-70b-instruct 9.1
28 meta-llama/llama-4-maverick-17b-128e-instruct 7.2
29 qwen/qwen3-0.6b 7
30 google/gemma-4-e2b-it 6.9
31 deepseek-ai/deepseek-r1 3.8