Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

SWE-bench Pro

8 models

Top 10 Models Performance

anthropic/claude-opus-4.8 ######################################## 69.2
anthropic/claude-opus-4.7 ##################################### 64.3
moonshotai/kimi-k2.6 ################################## 58.6
zai-org/glm-5.1 ################################## 58.4
google/gemini-3.1-pro-preview ############################### 54.2
orionllm/grm-2.6-plus ############################### 54
openai/gpt-oss-120b ######### 16.2
meta-llama/llama-4-maverick-17b-128e-instruct ### 5.24
68.8K – 862.0B
Rank Model Score
🥇 anthropic/claude-opus-4.8 69.2
🥈 anthropic/claude-opus-4.7 64.3
🥉 moonshotai/kimi-k2.6 58.6
4 zai-org/glm-5.1 58.4
5 google/gemini-3.1-pro-preview 54.2
6 orionllm/grm-2.6-plus 54
7 openai/gpt-oss-120b 16.2
8 meta-llama/llama-4-maverick-17b-128e-instruct 5.24