Humanity's Last Exam

knowledge

Around 2,500 expert-written, closed-ended questions across 100+ academic subjects. The hardest broad-knowledge exam in use; frontier models still score well below human experts.

Official benchmark page

Model rankings on Humanity's Last Exam

#ModelScoreAs ofSource
1Claude Opus 4.849.8%May 28, 2026 cite
2Claude Opus 4.746.9%May 28, 2026 cite
3Gemini 3.1 Pro Preview44.4%May 28, 2026 cite
4GPT-5.5 Pro43.1%May 28, 2026 cite
5GPT-5.541.4%May 28, 2026 cite
6GLM 5.240.5%Jun 13, 2026 cite
7DeepSeek V4 Pro37.7%Apr 24, 2026 cite
8DeepSeek V4 Flash34.8%Apr 24, 2026 cite
9Gemini 3 Flash Preview33.7%Dec 17, 2025 cite
10Claude Sonnet 4.633.2%Feb 17, 2026 cite
11Kimi K2 Thinking23.9%Nov 6, 2025 cite
12Gemini 2.5 Pro21.6%Jun 27, 2025 cite

Scores are self-reported or from primary evaluations, each linked to its source. Test conditions (tools, shots, prompt) vary between labs — see the source for details.

← All benchmarks · Full leaderboard