Claims Database | Elonalysis

↑Exaggerated🤖xAICLM-00004

“Grok 4 is now the #1 AI model in the world on every major benchmark. xAI is winning.”

X PostFebruary 11, 2026 at 9:00 PMOriginal source

445,000 likes67,000 reposts34,000,000 views

Verdict

Grok 4 leads on 2 of 7 major benchmarks. It trails Claude and GPT on reasoning, coding, and instruction following.

Full Analysis

Grok 4, released in January 2026, does lead on MMLU-Pro and certain math benchmarks. However, on the LMSYS Chatbot Arena (the most widely cited holistic benchmark), Grok 4 ranks 3rd behind Claude 4.5 and GPT-5. On SWE-bench (coding), HumanEval+, and IFEval (instruction following), Grok 4 is not the top performer. The claim "every major benchmark" is demonstrably false.

Evidence (1)

ContradictsLMSYSFeb 10, 2026

LMSYS Chatbot Arena Leaderboard

“Current rankings: 1. Claude 4.5 Opus, 2. GPT-5, 3. Grok 4, 4. Gemini Ultra 2”

Metadata

Confidence

94%

First checked

Feb 11, 2026

Last updated

Feb 11, 2026

AI & AutonomyScience & Technology