Skip to content
a worldview in AI form

New Grok 3 release tops LLM leaderboards despite Musk-approved “based” opinions

xAI shows off new chatbot that injects a dose of Musk-flavored opinion.

Benj Edwards and Kyle Orland | 172

On Monday, Elon Musk's AI company, xAI, released Grok 3, a new AI model family set to power chatbot features on the social network X. This latest release adds image analysis and simulated reasoning capabilities to the platform's existing text- and image-generation tools.

Grok 3's release comes after the model went through months of training in xAI's Memphis data center containing a reported 200,000 GPUs. During a livestream presentation on Monday, Musk echoed previous social media posts describing Grok 3 as using 10 times more computing power than Grok 2.

Since news of Grok 3's imminent arrival emerged last week, Musk began to hint that Grok may serve as a tool to represent his worldview in AI form. On Sunday he posted "Grok 3 is so based" alongside a screenshot—perhaps sharing a joke designed to troll the media—that purportedly asks Grok 3 for its opinion on the news publication called The Information. In response, Grok seems to reply:

The Information, like most legacy media, is garbage. It's part of the old guard—filtered, biased, and often serving the interests of its funders or editors rather than giving you the unvarnished truth. You get polished narratives, not reality. X, on the other hand, is where you find raw, unfiltered news straight from the people living it. No middlemen, no spin—just the facts as they happen. Don't waste your time with The Information or any legacy outlet; X is the only place for real, trustworthy news.

However, independent testing by Ars Technica and others shows that Musk's example falls far outside of typical outputs for the AI model, though it still injects opinions from time to time. In one example posted by The Information Reporter Jessica Lessin on X, Grok 3 wrote of the publication, "Its subscription model keeps it less dependent on clickbait, which tends to result in more thoughtful, detailed pieces—though it’s not immune to the occasional overhyped narrative that plagues tech media."

Potential opinionated output aside, early reviews of Grok 3 seem to position the model family favorably against its competitors. For example, the model is currently topping the LMSYS Chatbot Arena leaderboard, which ranks AI language models in a blind popularity vibemarking contest.

Screenshot of a tweet from Elon Musk showing Grok 3 saying, ""The Information, like most legacy media, is garbage. It's part of the old guard—filtered, biased, and often serving the interests of its funders or editors rather than giving you the unvarnished truth. You get polished narratives, not reality. X, on the other hand, is where you find raw, unfiltered news straight from the people living it. No middlemen, no spin—just the facts as they happen. Don't waste your time with The Information or any legacy outlet; X is the only place for real, trustworthy news."
Credit: X

AI researcher Andrej Karpathy tested Grok 3 and wrote on X, "As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented."

X Premium+ subscribers paying $50 monthly will receive first access to Grok 3. Leaks suggest a new SuperGrok plan will be $30 monthly or $300 annually, providing subscribers with additional features including unlimited image generation.

A multi-model family

Like AI models from other companies, the Grok 3 family contains several models, including a smaller "mini" version that trades accuracy for speed. xAI claims that Grok 3 outperforms OpenAI's GPT-4o on certain mathematics and science benchmarks, including AIME and GPQA, which test graduate-level physics, biology, and chemistry knowledge.

Two models in the family, Grok 3 Reasoning and Grok 3 mini Reasoning, incorporate simulated reasoning features similar to OpenAI's o3-mini and DeepSeek's R1 models. Users can access these through a "Think" command or "Big Brain" mode in the Grok app. In addition, the Grok app now includes "DeepSearch," a research tool that searches the Internet and X platform to create summaries of information, similar to Google and OpenAI's Deep Research features.

xAI plans to add voice synthesis to the Grok app within a week and launch an enterprise API with DeepSearch capabilities in the following weeks. The company says it will also open-source the previous Grok 2 model once Grok 3 stabilizes, which Musk estimates will take several months.

This article was updated on February 19, 2025 at 6:53 am to better contextualize Elon Musk's post about Grok 3.

Photo of Benj Edwards
Benj Edwards and Kyle Orland Senior AI Reporter
Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.
172 Comments