Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

google/gemma-4-e4b-it

14 benchmarks
Codeforces ELO 940 MMMLU 76.6 MMLU-Pro 69.4 MATH-Vision 59.5 GPQA Diamond 58.6 MMMU-Pro 52.6 LiveCodeBench v6 52 AIME 2026 42.5 TauBench V2 (Average) 42.2 CoVoST 35.54 MedXPertQA MM 28.7 MRCR v2 8 needle 25.4 Artificial Analysis Intelligence Index (Maximum Reasoning) 18.8 WikiText-2 (-ppl) -68.255