LMArena Elo

composite

Crowdsourced human-preference rating from blind, pairwise model battles on LMArena (formerly Chatbot Arena). A real-world signal of overall response quality.

Official benchmark page

Model rankings on LMArena Elo

No verified scores for this benchmark yet. We only list results with a primary source.

Scores are self-reported or from primary evaluations, each linked to its source. Test conditions (tools, shots, prompt) vary between labs — see the source for details.

← All benchmarks · Full leaderboard