AI benchmarks, explained

The tests behind the leaderboard. Each one probes a different skill — graduate science, real GitHub bug-fixing, competition math, terminal agents. Click any benchmark to see what it measures and which models lead.

Reasoning

Knowledge

Coding

Math

Agentic / tool use

Composite & human preference