AI model changelog
New models and fresh benchmark results as they land on the leaderboard, newest first.
June 28, 2026
GPT-5.5 — scored 78.8 on RunFree Score
Claude Opus 4.8 — scored 94.7 on RunFree Score
Claude Opus 4.7 — scored 85.7 on RunFree Score
Claude Sonnet 4.6 — scored 66.2 on RunFree Score
Gemini 3.1 Pro Preview — scored 78 on RunFree Score
Gemini 3 Flash Preview — scored 63.5 on RunFree Score
Gemini 2.5 Pro — scored 22.3 on RunFree Score
GLM 5.2 — scored 86.3 on RunFree Score
DeepSeek V4 Pro — scored 72.4 on RunFree Score
DeepSeek V4 Flash — scored 59.1 on RunFree Score
Kimi K2 Thinking — scored 45.2 on RunFree Score
Llama 4 Maverick — scored 0 on RunFree Score
Qwen3 Max Thinking — scored 94.8 on RunFree Score
June 24, 2026
Fugu Ultra — Sakana added to the leaderboard
June 17, 2026
North Mini Code (free) — Cohere added to the leaderboard
June 16, 2026
GLM 5.2 — Z.ai added to the leaderboard
June 13, 2026
June 12, 2026
Kimi K2.7 Code — MoonshotAI added to the leaderboard
June 9, 2026
Claude Fable Latest — Anthropic added to the leaderboard
Claude Fable 5 — Anthropic added to the leaderboard
June 8, 2026
Nex-N2-Pro — Nex AGI added to the leaderboard
June 4, 2026
Nemotron 3.5 Content Safety (free) — NVIDIA added to the leaderboard
Nemotron 3 Ultra (free) — NVIDIA added to the leaderboard
Nemotron 3 Ultra — NVIDIA added to the leaderboard
June 3, 2026
Qwen3.7 Plus — Qwen added to the leaderboard
June 1, 2026
MiniMax M2.7 — scored 57% on Terminal-Bench
May 31, 2026
MiniMax M3 — MiniMax added to the leaderboard
May 28, 2026
Step 3.7 Flash — StepFun added to the leaderboard
GPT-5.5 — scored 78.2% on Terminal-Bench
GPT-5.5 — scored 41.4% on Humanity's Last Exam
GPT-5.5 Pro — scored 43.1% on Humanity's Last Exam
Claude Opus 4.8 — scored 88.6% on SWE-Bench Verified
Claude Opus 4.8 — scored 74.6% on Terminal-Bench
Claude Opus 4.8 — scored 93.6% on GPQA Diamond
Claude Opus 4.8 — scored 49.8% on Humanity's Last Exam
Claude Opus 4.7 — scored 87.6% on SWE-Bench Verified
Claude Opus 4.7 — scored 66.1% on Terminal-Bench
Claude Opus 4.7 — scored 94.2% on GPQA Diamond
Claude Opus 4.7 — scored 46.9% on Humanity's Last Exam
Gemini 3.1 Pro Preview — scored 44.4% on Humanity's Last Exam
May 27, 2026
Claude Opus 4.8 (Fast) — Anthropic added to the leaderboard
Claude Opus 4.8 — Anthropic added to the leaderboard
May 21, 2026
Qwen3.7 Max — Qwen added to the leaderboard
May 20, 2026
Grok Build 0.1 — xAI added to the leaderboard
May 19, 2026
Gemini 3.5 Flash — Google added to the leaderboard
May 12, 2026
Claude Opus 4.7 (Fast) — Anthropic added to the leaderboard
Perceptron Mk1 — Perceptron added to the leaderboard
May 8, 2026
Ring-2.6-1T — inclusionAI added to the leaderboard
May 7, 2026
Gemini 3.1 Flash Lite — Google added to the leaderboard
May 5, 2026
GPT Chat Latest — OpenAI added to the leaderboard
April 30, 2026
Grok 4.3 — xAI added to the leaderboard
Granite 4.1 8B — IBM added to the leaderboard
Mistral Medium 3.5 — Mistral added to the leaderboard
April 28, 2026
Nemotron 3 Nano Omni (free) — NVIDIA added to the leaderboard
Laguna XS.2 (free) — Poolside added to the leaderboard
Laguna XS.2 — Poolside added to the leaderboard
Laguna M.1 (free) — Poolside added to the leaderboard
Laguna M.1 — Poolside added to the leaderboard
April 27, 2026
Anthropic Claude Haiku Latest — Anthropic added to the leaderboard
OpenAI GPT Mini Latest — OpenAI added to the leaderboard
Google Gemini Pro Latest — Google added to the leaderboard
MoonshotAI Kimi Latest — MoonshotAI added to the leaderboard
Google Gemini Flash Latest — Google added to the leaderboard
Anthropic Claude Sonnet Latest — Anthropic added to the leaderboard
OpenAI GPT Latest — OpenAI added to the leaderboard
Qwen3.5 Plus 2026-04-20 — Qwen added to the leaderboard
Qwen3.6 Flash — Qwen added to the leaderboard
Qwen3.6 35B A3B — Qwen added to the leaderboard
Qwen3.6 Max Preview — Qwen added to the leaderboard
Qwen3.6 27B — Qwen added to the leaderboard
April 24, 2026
GPT-5.5 Pro — OpenAI added to the leaderboard
GPT-5.5 — OpenAI added to the leaderboard
DeepSeek V4 Pro — DeepSeek added to the leaderboard
DeepSeek V4 Flash — DeepSeek added to the leaderboard
DeepSeek V4 Pro — scored 90.1% on GPQA Diamond
DeepSeek V4 Pro — scored 37.7% on Humanity's Last Exam
DeepSeek V4 Pro — scored 87.5% on MMLU-Pro
DeepSeek V4 Pro — scored 80.6% on SWE-Bench Verified