Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index v4.0
← Back to leaderboard

RiddleBench

6 models

Top 10 Models Performance

openai/gpt-oss-120b ######################################## 69.26
deepseek-ai/deepseek-v3 ################################## 58.28
qwen/qwq-32b ############################# 50.86
deepseek-ai/deepseek-r1 ############################# 50.56
meta-llama/llama-3.3-70b-instruct ################ 27.48
google/gemma-3-27b-it ############## 25.04
6.9K – 862.0B
2019 – 2026
Rank Model Score
🥇 openai/gpt-oss-120b 69.26
🥈 deepseek-ai/deepseek-v3 58.28
🥉 qwen/qwq-32b 50.86
4 deepseek-ai/deepseek-r1 50.56
5 meta-llama/llama-3.3-70b-instruct 27.48
6 google/gemma-3-27b-it 25.04