Artificial Analysis Agentic Index (Maximum Reasoning)
31 models
Top 10 Models Performance
| anthropic/claude-opus-4.8 | ######################################## | 77.8 |
| openai/gpt-5.5 | ###################################### | 74.1 |
| anthropic/claude-opus-4.7 | ##################################### | 71.3 |
| google/gemini-3.5-flash | #################################### | 70.3 |
| openai/gpt-5.4 | ################################### | 68 |
| xiaomi/mimo-v2.5-pro | ################################### | 67.4 |
| deepseek-ai/deepseek-v4-pro | ################################### | 67.2 |
| zai-org/glm-5.1 | ################################## | 67.1 |
| qwen/qwen3.7-max | ################################## | 66.6 |
| moonshotai/kimi-k2.6 | ################################## | 66 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | anthropic/claude-opus-4.8 | 77.8 |
| 🥈 | openai/gpt-5.5 | 74.1 |
| 🥉 | anthropic/claude-opus-4.7 | 71.3 |
| 4 | google/gemini-3.5-flash | 70.3 |
| 5 | openai/gpt-5.4 | 68 |
| 6 | xiaomi/mimo-v2.5-pro | 67.4 |
| 7 | deepseek-ai/deepseek-v4-pro | 67.2 |
| 8 | zai-org/glm-5.1 | 67.1 |
| 9 | qwen/qwen3.7-max | 66.6 |
| 10 | moonshotai/kimi-k2.6 | 66 |
| 11 | x-ai/grok-4.3 | 65.9 |
| 12 | qwen/qwen3.6-max | 64.8 |
| 13 | anthropic/claude-sonnet-4.6 | 63 |
| 14 | meta/muse-spark | 62 |
| 15 | minimaxai/minimax-m2.7 | 61.5 |
| 16 | openai/gpt-5.3-codex | 60.5 |
| 17 | openai/gpt-5.2 | 60.2 |
| 18 | openai/gpt-5.2-codex | 56.5 |
| 19 | mistralai/mistral-medium-3.5-128b | 53.2 |
| 20 | x-ai/grok-4.1-fast | 49.3 |
| 21 | x-ai/grok-4-fast | 39.5 |
| 22 | x-ai/grok-code-fast-1 | 35.6 |
| 23 | google/gemma-4-26b-a4b-it | 32.1 |
| 24 | openai/o1 | 31.1 |
| 25 | nousresearch/hermes-4-405b | 12.6 |
| 26 | nousresearch/hermes-4-70b | 11.7 |
| 27 | meta-llama/llama-3.3-70b-instruct | 9.1 |
| 28 | meta-llama/llama-4-maverick-17b-128e-instruct | 7.2 |
| 29 | qwen/qwen3-0.6b | 7 |
| 30 | google/gemma-4-e2b-it | 6.9 |
| 31 | deepseek-ai/deepseek-r1 | 3.8 |