GPQA Diamond
reasoningGraduate-level, Google-proof science questions (the hard 'Diamond' subset). PhD-level chemistry, biology and physics that domain experts answer at roughly 65%, designed to resist web lookup.
Official benchmark pageModel rankings on GPQA Diamond
| # | Model | Score | As of | Source |
|---|---|---|---|---|
| 1 | 94.3% | Feb 19, 2026 | cite | |
| 2 | 94.2% | May 28, 2026 | cite | |
| 3 | 93.6% | May 28, 2026 | cite | |
| 4 | 92.8% | Nov 12, 2025 | cite | |
| 5 | 91.2% | Jun 13, 2026 | cite | |
| 6 | 90.4% | Dec 17, 2025 | cite | |
| 7 | 90.1% | Apr 24, 2026 | cite | |
| 8 | 89.9% | Feb 17, 2026 | cite | |
| 9 | 88.4% | Aug 7, 2025 | cite | |
| 10 | 88.1% | Apr 24, 2026 | cite | |
| 11 | 86.4% | Jun 27, 2025 | cite | |
| 12 | 84.5% | Nov 6, 2025 | cite | |
| 13 | 69.8% | Apr 5, 2025 | cite |
Scores are self-reported or from primary evaluations, each linked to its source. Test conditions (tools, shots, prompt) vary between labs — see the source for details.