Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

Artificial Analysis Coding Index (Maximum Reasoning)

18 models

Top 10 Models Performance

anthropic/claude-opus-4.8 ######################################## 56.7
openai/gpt-5.3-codex ##################################### 53.1
openai/gpt-5.2 ################################## 48.7
openai/gpt-5.2-codex ############################## 43
mistralai/mistral-medium-3.5-128b ######################### 35.4
x-ai/grok-4.1-fast ###################### 30.9
x-ai/grok-4-fast ################### 27.4
x-ai/grok-code-fast-1 ################# 23.7
google/gemma-4-26b-a4b-it ################ 22.4
openai/o1 ############## 20.5
68.8K – 862.0B
Rank Model Score
🥇 anthropic/claude-opus-4.8 56.7
🥈 openai/gpt-5.3-codex 53.1
🥉 openai/gpt-5.2 48.7
4 openai/gpt-5.2-codex 43
5 mistralai/mistral-medium-3.5-128b 35.4
6 x-ai/grok-4.1-fast 30.9
7 x-ai/grok-4-fast 27.4
8 x-ai/grok-code-fast-1 23.7
9 google/gemma-4-26b-a4b-it 22.4
10 openai/o1 20.5
11 nousresearch/hermes-4-405b 16
12 deepseek-ai/deepseek-r1 15.9
13 meta-llama/llama-4-maverick-17b-128e-instruct 15.6
14 nousresearch/hermes-4-70b 14.4
15 openai/gpt-3.5-turbo 10.7
16 meta-llama/llama-3.3-70b-instruct 10.7
17 google/gemma-4-e2b-it 9
18 qwen/qwen3-0.6b 0.9