Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

SciCode

6 models

SciCode measures an AI's ability to generate executable scientific code that solves real research problems — including data analysis, simulations, and modeling — based on natural language descriptions from scientific papers, testing both coding proficiency and domain-specific scientific understanding.

Top 10 Models Performance

google/gemini-3.1-pro-preview ######################################## 59
moonshotai/kimi-k2.6 ################################### 52.2
x-ai/grok-4.3 ################################ 47
openai/o3-mini ########################### 40
inception/mercury-2 ########################## 38
nvidia/nvidia-nemotron-3-nano-30b-a3b ####################### 33.3
Rank Model Score
🥇 google/gemini-3.1-pro-preview 59
🥈 moonshotai/kimi-k2.6 52.2
🥉 x-ai/grok-4.3 47
4 openai/o3-mini 40
5 inception/mercury-2 38
6 nvidia/nvidia-nemotron-3-nano-30b-a3b 33.3