SWE-bench Pro
8 models
Top 10 Models Performance
| anthropic/claude-opus-4.8 | ######################################## | 69.2 |
| anthropic/claude-opus-4.7 | ##################################### | 64.3 |
| moonshotai/kimi-k2.6 | ################################## | 58.6 |
| zai-org/glm-5.1 | ################################## | 58.4 |
| google/gemini-3.1-pro-preview | ############################### | 54.2 |
| orionllm/grm-2.6-plus | ############################### | 54 |
| openai/gpt-oss-120b | ######### | 16.2 |
| meta-llama/llama-4-maverick-17b-128e-instruct | ### | 5.24 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | anthropic/claude-opus-4.8 | 69.2 |
| 🥈 | anthropic/claude-opus-4.7 | 64.3 |
| 🥉 | moonshotai/kimi-k2.6 | 58.6 |
| 4 | zai-org/glm-5.1 | 58.4 |
| 5 | google/gemini-3.1-pro-preview | 54.2 |
| 6 | orionllm/grm-2.6-plus | 54 |
| 7 | openai/gpt-oss-120b | 16.2 |
| 8 | meta-llama/llama-4-maverick-17b-128e-instruct | 5.24 |