WikiText-2 (-ppl)
20 models
This benchmark is measuring capability of AI to write Wikipedia-styled articles.
Top 10 Models Performance
| qwen/qwen3.5-9b | ################ | -8.324 |
| samfatnassi/kilma-v1-base | ################## | -9.0167 |
| huggingfacetb/smollm3-3b | ################### | -9.7604 |
| huggingfacetb/smollm2-360m | ######################## | -12.1571 |
| huggingfacetb/smollm2-360m-instruct | ########################## | -13.6027 |
| qwen/qwen2.5-0.5b | ############################ | -14.3732 |
| qwen/qwen3-0.6b-base | ############################ | -14.639 |
| google/gemma-3-4b-it | ################################ | -16.7015 |
| qwen/qwen2.5-0.5b-instruct | ################################## | -17.493 |
| qwen/qwen3.5-0.8b | ######################################## | -20.5747 |
| Rank | Model | Score |
|---|---|---|
| 🥇 | qwen/qwen3.5-9b | -8.324 |
| 🥈 | samfatnassi/kilma-v1-base | -9.0167 |
| 🥉 | huggingfacetb/smollm3-3b | -9.7604 |
| 4 | huggingfacetb/smollm2-360m | -12.1571 |
| 5 | huggingfacetb/smollm2-360m-instruct | -13.6027 |
| 6 | qwen/qwen2.5-0.5b | -14.3732 |
| 7 | qwen/qwen3-0.6b-base | -14.639 |
| 8 | google/gemma-3-4b-it | -16.7015 |
| 9 | qwen/qwen2.5-0.5b-instruct | -17.493 |
| 10 | qwen/qwen3.5-0.8b | -20.5747 |
| 11 | liquid/lfm-2.5-1.2b-instruct | -21.9587 |
| 12 | google/gemma-4-e4b | -23.1498 |
| 13 | qwen/qwen3-0.6b | -24.0627 |
| 14 | openai-community/gpt2 | -28.8355 |
| 15 | liquid/lfm-2.5-350m | -47.55099 |
| 16 | appvoid/carbono-001 | -51.4911 |
| 17 | google/gemma-3-270m-it | -65.6058 |
| 18 | google/gemma-4-e4b-it | -68.255 |
| 19 | raincandy-u/rain-100m | -107.9683 |
| 20 | sapbot/toyllama-50m | -405.2885 |