BBH
19 models
Top 10 Models Performance
| deepseek-ai/deepseek-v2.5 | ######################################## | 84.3 |
| qwen/qwen3-4b | ##################################### | 77.8 |
| tencent/youtu-llm-2b | ##################################### | 77.5 |
| huggingfacetb/smollm3-3b | #################################### | 76.3 |
| yandex/gpt-5-lite-pretrain | ################################### | 73.91 |
| openbmb/minicpm5-1b | ################################## | 71.89 |
| qwen/qwen3-1.7b | ################################# | 69.1 |
| qwen/qwen3-1.7b-base | ########################## | 54.47 |
| tiiuae/falcon-h1-1.5b-deep-instruct | ########################## | 54.43 |
| tiiuae/falcon-h1-1.5b-deep-base | ######################### | 52.37 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | deepseek-ai/deepseek-v2.5 | 84.3 |
| 🥈 | qwen/qwen3-4b | 77.8 |
| 🥉 | tencent/youtu-llm-2b | 77.5 |
| 4 | huggingfacetb/smollm3-3b | 76.3 |
| 5 | yandex/gpt-5-lite-pretrain | 73.91 |
| 6 | openbmb/minicpm5-1b | 71.89 |
| 7 | qwen/qwen3-1.7b | 69.1 |
| 8 | qwen/qwen3-1.7b-base | 54.47 |
| 9 | tiiuae/falcon-h1-1.5b-deep-instruct | 54.43 |
| 10 | tiiuae/falcon-h1-1.5b-deep-base | 52.37 |
| 11 | tiiuae/falcon-h1-1.5b-base | 46.57 |
| 12 | tiiuae/falcon-h1-1.5b-instruct | 46.47 |
| 13 | qwen/qwen2.5-1.5b | 45.1 |
| 14 | qwen/qwen3-0.6b-base | 41.47 |
| 15 | deepseek-ai/deepseek-r1-distill-qwen-1.5b | 31 |
| 16 | huggingfacetb/smollm2-135m-instruct | 28.2 |
| 17 | google/gemma-3-1b-pt | 28.13 |
| 18 | tiiuae/falcon3-mamba-7b-base | 25.6 |
| 19 | qwen/qwen2.5-0.5b | 20.3 |