MBPP+
17 models
Top 5 Models Performance
| tencent/hy3-preview-base | ######################################## | 78.71 |
| qwen/qwen3-4b | ####################################### | 77.6 |
| essentialai/rnj-1-instruct | ###################################### | 75.7 |
| openai/gpt-oss-20b | ###################################### | 75.6 |
| zyphra/zaya1-base | ###################################### | 75.4 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | tencent/hy3-preview-base | 78.71 |
| 🥈 | qwen/qwen3-4b | 77.6 |
| 🥉 | essentialai/rnj-1-instruct | 75.7 |
| 4 | openai/gpt-oss-20b | 75.6 |
| 5 | zyphra/zaya1-base | 75.4 |
| 6 | qwen/qwen2.5-coder-7b-instruct | 71.7 |
| 7 | tencent/youtu-llm-2b | 71.7 |
| 8 | qwen/qwen3-8b | 71.2 |
| 9 | essentialai/rnj-1 | 70.6 |
| 10 | qwen/qwen3-1.7b | 67.7 |
| 11 | qwen/qwen3-8b-base | 66.4 |
| 12 | tiiuae/falcon-h1-1.5b-deep-base | 60.32 |
| 13 | huggingfacetb/smollm3-3b | 56.7 |
| 14 | tiiuae/falcon-h1-1.5b-deep-instruct | 56.61 |
| 15 | tiiuae/falcon-h1-1.5b-instruct | 56.35 |
| 16 | tiiuae/falcon-h1-1.5b-base | 55.03 |
| 17 | deepseek-ai/deepseek-r1-distill-qwen-1.5b | 44.2 |