MMLU-Pro

knowledge

A harder, reasoning-heavy rebuild of MMLU with 10 answer choices instead of 4, cutting saturation and prompt sensitivity across 14 disciplines.

Official benchmark page

Model rankings on MMLU-Pro

#	Model	Score	As of	Source
1	Gemini 3.1 Pro Preview	92.6%	Feb 19, 2026	cite
2	DeepSeek V4 Pro	87.5%	Apr 24, 2026	cite
3	DeepSeek V4 Flash	86.4%	Apr 24, 2026	cite
4	Kimi K2 Thinking	84.6%	Nov 6, 2025	cite
5	Llama 4 Maverick	80.5%	Apr 5, 2025	cite

Scores are self-reported or from primary evaluations, each linked to its source. Test conditions (tools, shots, prompt) vary between labs — see the source for details.

← All benchmarks · Full leaderboard