HumanEval
21 models
Top 10 Models Performance
| tencent/youtu-llm-2b | ######################################## | 95.9 |
| qwen/qwen3-4b | ######################################## | 95.4 |
| deepseek-ai/deepseek-v2.5 | ##################################### | 89 |
| qwen/qwen3-1.7b | ################################### | 84.8 |
| huggingfacetb/smollm3-3b | ################################# | 79.9 |
| tiiuae/falcon-h1-1.5b-deep-instruct | ############################### | 73.78 |
| yandex/gpt-5-lite | ############################## | 71.8 |
| tiiuae/falcon-h1-1.5b-instruct | ############################ | 68.29 |
| yandex/gpt-5-lite-pretrain | ############################ | 66.5 |
| qwen/qwen2.5-coder-32b | ########################### | 65.9 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | tencent/youtu-llm-2b | 95.9 |
| 🥈 | qwen/qwen3-4b | 95.4 |
| 🥉 | deepseek-ai/deepseek-v2.5 | 89 |
| 4 | qwen/qwen3-1.7b | 84.8 |
| 5 | huggingfacetb/smollm3-3b | 79.9 |
| 6 | tiiuae/falcon-h1-1.5b-deep-instruct | 73.78 |
| 7 | yandex/gpt-5-lite | 71.8 |
| 8 | tiiuae/falcon-h1-1.5b-instruct | 68.29 |
| 9 | yandex/gpt-5-lite-pretrain | 66.5 |
| 10 | qwen/qwen2.5-coder-32b | 65.9 |
| 11 | deepseek-ai/deepseek-r1-distill-qwen-1.5b | 64 |
| 12 | qwen/qwen2.5-coder-14b | 64 |
| 13 | qwen/qwen2.5-coder-7b | 61.6 |
| 14 | tiiuae/falcon-h1-1.5b-deep-base | 52.44 |
| 15 | qwen/qwen2.5-coder-3b | 52.4 |
| 16 | tiiuae/falcon-h1-1.5b-base | 50 |
| 17 | google/gemma-3-27b-pt | 48.8 |
| 18 | google/gemma-3-12b-pt | 45.7 |
| 19 | qwen/qwen2.5-coder-1.5b | 43.9 |
| 20 | google/gemma-3-4b-pt | 36 |
| 21 | qwen/qwen2.5-coder-0.5b | 28 |