Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

MBPP+

17 models

Top 5 Models Performance

tencent/hy3-preview-base ######################################## 78.71
qwen/qwen3-4b ####################################### 77.6
essentialai/rnj-1-instruct ###################################### 75.7
openai/gpt-oss-20b ###################################### 75.6
zyphra/zaya1-base ###################################### 75.4
69K – 862.0B
2019 – 2026
Rank Model Score
🥇 tencent/hy3-preview-base 78.71
🥈 qwen/qwen3-4b 77.6
🥉 essentialai/rnj-1-instruct 75.7
4 openai/gpt-oss-20b 75.6
5 zyphra/zaya1-base 75.4
6 qwen/qwen2.5-coder-7b-instruct 71.7
7 tencent/youtu-llm-2b 71.7
8 qwen/qwen3-8b 71.2
9 essentialai/rnj-1 70.6
10 qwen/qwen3-1.7b 67.7
11 qwen/qwen3-8b-base 66.4
12 tiiuae/falcon-h1-1.5b-deep-base 60.32
13 huggingfacetb/smollm3-3b 56.7
14 tiiuae/falcon-h1-1.5b-deep-instruct 56.61
15 tiiuae/falcon-h1-1.5b-instruct 56.35
16 tiiuae/falcon-h1-1.5b-base 55.03
17 deepseek-ai/deepseek-r1-distill-qwen-1.5b 44.2