Note: Overall leaderboard rankings may not reflect true model quality — individual benchmarks give a clearer picture. ARC-Challenge MMLU GPQA GSM8K Artificial Analysis Intelligence Index
← Back to leaderboard

arcee-ai/trinity-large-thinking

10 benchmarks
AIME 2025 96.3 TauBench V2 (Telecom) 94.7 PinchBench 91.9 TauBench V2 (Airline) 88 MMLU-Pro 83.4 GPQA Diamond 76.3 BFCL v4 70.1 SWE-bench Verified 63.2 IFBench 52.3 Artificial Analysis Intelligence Index (Maximum Reasoning) 31.9