MMLU
23 models
Top 10 Models Performance
| openai/gpt-5 | ######################################## | 92.5 |
| openai/gpt-5.5 | ######################################## | 92.4 |
| deepseek-ai/deepseek-r1 | #################################### | 84 |
| qwen/qwen2.5-coder-32b | ################################## | 79.1 |
| qwen/qwen2.5-coder-14b | ################################# | 75.2 |
| openai/gpt-3.5-turbo | ############################## | 70 |
| qwen/qwen2.5-coder-7b | ############################# | 68 |
| qwen/qwen3-1.7b-base | ########################### | 62.63 |
| qwen/qwen2.5-coder-3b | ########################## | 61.2 |
| qwen/qwen2.5-1.5b | ########################## | 60.9 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | openai/gpt-5 | 92.5 |
| 🥈 | openai/gpt-5.5 | 92.4 |
| 🥉 | deepseek-ai/deepseek-r1 | 84 |
| 4 | qwen/qwen2.5-coder-32b | 79.1 |
| 5 | qwen/qwen2.5-coder-14b | 75.2 |
| 6 | openai/gpt-3.5-turbo | 70 |
| 7 | qwen/qwen2.5-coder-7b | 68 |
| 8 | qwen/qwen3-1.7b-base | 62.63 |
| 9 | qwen/qwen2.5-coder-3b | 61.2 |
| 10 | qwen/qwen2.5-1.5b | 60.9 |
| 11 | qwen/qwen2.5-coder-1.5b | 53.6 |
| 12 | qwen/qwen3-0.6b-base | 52.81 |
| 13 | qwen/qwen2.5-0.5b | 47.5 |
| 14 | qwen/qwen2.5-coder-0.5b | 42 |
| 15 | huggingfacetb/smollm2-360m-instruct | 35.8 |
| 16 | huggingfacetb/smollm-360m-instruct | 34.4 |
| 17 | openai-community/gpt2 | 32.4 |
| 18 | huggingfacetb/smollm2-135m | 31.5 |
| 19 | huggingfacetb/smollm2-135m-instruct | 29.3 |
| 20 | google/gemma-3-1b-pt | 26.26 |
| 21 | amd/amd-llama-135m | 23.02 |
| 22 | raincandy-u/rain-100m | 9.06 |
| 23 | sapbot/toyllama-50m | 0.01 |