Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index

← Back to leaderboard

openai/o3-mini

5 benchmarks

IFEval 93.9 GPQA Diamond 75 SciCode 40 TauBench V2 (Telecom) 29 Humanity's Last Exam (no tools) 9

JavaScript enhances filtering and charts. All data is rendered server-side.

View the sitemap for available pages.