SciCode
6 models
SciCode measures an AI's ability to generate executable scientific code that solves real research problems — including data analysis, simulations, and modeling — based on natural language descriptions from scientific papers, testing both coding proficiency and domain-specific scientific understanding.
Top 10 Models Performance
| google/gemini-3.1-pro-preview | ######################################## | 59 |
| moonshotai/kimi-k2.6 | ################################### | 52.2 |
| x-ai/grok-4.3 | ################################ | 47 |
| openai/o3-mini | ########################### | 40 |
| inception/mercury-2 | ########################## | 38 |
| nvidia/nvidia-nemotron-3-nano-30b-a3b | ####################### | 33.3 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | google/gemini-3.1-pro-preview | 59 |
| 🥈 | moonshotai/kimi-k2.6 | 52.2 |
| 🥉 | x-ai/grok-4.3 | 47 |
| 4 | openai/o3-mini | 40 |
| 5 | inception/mercury-2 | 38 |
| 6 | nvidia/nvidia-nemotron-3-nano-30b-a3b | 33.3 |