Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

zai-org/glm-5

12 benchmarks
MineBench 1009 TauBench V2 (Telecom) 98.2 AIME 2026 95.83 AIME 2025 93.3 PinchBench 86.4 MMLU-Pro 85.8 GPQA Diamond 81.6 TauBench V2 (Airline) 80.5 SWE-bench Verified 77.8 IFBench 72.3 BFCL v4 70.8 Artificial Analysis Intelligence Index (Maximum Reasoning) 49.8