MMLU-Pro

knowledge

A harder, reasoning-heavy rebuild of MMLU with 10 answer choices instead of 4, cutting saturation and prompt sensitivity across 14 disciplines.

Official benchmark page

Model rankings on MMLU-Pro

#ModelScoreAs ofSource
1Gemini 3.1 Pro Preview92.6%Feb 19, 2026 cite
2DeepSeek V4 Pro87.5%Apr 24, 2026 cite
3DeepSeek V4 Flash86.4%Apr 24, 2026 cite
4Kimi K2 Thinking84.6%Nov 6, 2025 cite
5Llama 4 Maverick80.5%Apr 5, 2025 cite

Scores are self-reported or from primary evaluations, each linked to its source. Test conditions (tools, shots, prompt) vary between labs — see the source for details.

← All benchmarks · Full leaderboard