Humanity's Last Exam

knowledge

Around 2,500 expert-written, closed-ended questions across 100+ academic subjects. The hardest broad-knowledge exam in use; frontier models still score well below human experts.

Official benchmark page

Model rankings on Humanity's Last Exam

#	Model	Score	As of	Source
1	Claude Opus 4.8	49.8%	May 28, 2026	cite
2	Claude Opus 4.7	46.9%	May 28, 2026	cite
3	Gemini 3.1 Pro Preview	44.4%	May 28, 2026	cite
4	GPT-5.5 Pro	43.1%	May 28, 2026	cite
5	GPT-5.5	41.4%	May 28, 2026	cite
6	GLM 5.2	40.5%	Jun 13, 2026	cite
7	DeepSeek V4 Pro	37.7%	Apr 24, 2026	cite
8	DeepSeek V4 Flash	34.8%	Apr 24, 2026	cite
9	Gemini 3 Flash Preview	33.7%	Dec 17, 2025	cite
10	Claude Sonnet 4.6	33.2%	Feb 17, 2026	cite
11	Kimi K2 Thinking	23.9%	Nov 6, 2025	cite
12	Gemini 2.5 Pro	21.6%	Jun 27, 2025	cite

Scores are self-reported or from primary evaluations, each linked to its source. Test conditions (tools, shots, prompt) vary between labs — see the source for details.

← All benchmarks · Full leaderboard