Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

HellaSwag

14 models

Top 10 Models Performance

anthropic/claude-3-opus ######################################## 95.4
huggingfacetb/smollm2-1.7b-instruct ############################# 68.7
huggingfacetb/smollm2-360m-instruct ####################### 54.5
huggingfacetb/smollm-360m-instruct ###################### 51.8
huggingfacetb/smollm2-135m ################## 42.1
huggingfacetb/smollm2-135m-instruct ################# 40.9
google/gemma-3-270m-it ################ 37.7
qwen/qwen3-0.6b ################ 37.62
amd/amd-llama-135m ############# 30.48
raincandy-u/rain-v2 ############# 30
Rank Model Score
🥇 anthropic/claude-3-opus 95.4
🥈 huggingfacetb/smollm2-1.7b-instruct 68.7
🥉 huggingfacetb/smollm2-360m-instruct 54.5
4 huggingfacetb/smollm-360m-instruct 51.8
5 huggingfacetb/smollm2-135m 42.1
6 huggingfacetb/smollm2-135m-instruct 40.9
7 google/gemma-3-270m-it 37.7
8 qwen/qwen3-0.6b 37.62
9 amd/amd-llama-135m 30.48
10 raincandy-u/rain-v2 30
11 openai-community/gpt2 28.92
12 raincandy-u/rain-100m 26.84
13 sapbot/toyllama-50m 26.24
14 smalldoge/doge-20m 25.57