GPQA
19 models
Top 10 Models Performance
93.6
90.5
59.6
| Rank | Model | Score |
|---|---|---|
| 🥇 | google/gemini-3.1-pro-preview | 94.3 |
| 🥈 | anthropic/claude-opus-4.7 | 94.2 |
| 🥉 | openai/gpt-5.5 | 93.6 |
| 4 | moonshotai/kimi-k2.6 | 90.5 |
| 5 | deepseek-ai/deepseek-v4-pro | 90.1 |
| 6 | deepseek-ai/deepseek-v4-flash | 88.1 |
| 7 | inception/mercury-2 | 74 |
| 8 | nvidia/nvidia-nemotron-3-nano-30b-a3b | 73 |
| 9 | nvidia/nvidia-nemotron-nano-9b-v2 | 64 |
| 10 | qwen/qwen3-8b | 59.6 |
| 11 | nvidia/nvidia-nemotron-3-nano-4b | 53.2 |
| 12 | yandex/gpt-5.1-pro | 46 |
| 13 | yandex/gpt-5-pro | 42 |
| 14 | qwen/qwen3-1.7b-base | 28.28 |
| 15 | qwen/qwen3-0.6b-base | 26.77 |
| 16 | qwen/qwen2.5-0.5b | 24.75 |
| 17 | google/gemma-3-1b-pt | 24.75 |
| 18 | qwen/qwen2.5-1.5b | 24.24 |
| 19 | qwen/qwen3.5-0.8b | 11.9 |