Destination

2025-08-07

Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI


In the ARC-AGI-2 benchmark, which is designed to measure a language model's general reasoning skills, GPT-5 (High) scored 9.9 percent at a cost of $0.73 per task, according to ARC Prize.


The article Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI appeared first on THE DECODER.

[...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-20

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 444.97

venturebeat

2025-11-18

Musk's xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple ind [...]

Match Score: 270.07

venturebeat

2025-10-08

Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.Alexia Jolicoe [...]

Match Score: 179.20

venturebeat

2025-11-06

Moonshot's Kimi K2 Thinking emerges as leading open source AI, outperforming GPT-5, Claude Sonnet 4.5 on key benchmarks

Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]

Match Score: 161.79

venturebeat

2025-11-17

Phi-4 proves that a 'data-first' SFT methodology is the new differentiator

AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]

Match Score: 159.92

Destination

2025-09-11

Grok claimed the Charlie Kirk assassination video was a 'meme edit'

Grok has once again been caught spreading blatant misinformation on X. In several bizarre exchanges, the chatbot repeatedly claimed that Charlie Kirk was "fine" and that gruesome videos of h [...]

Match Score: 153.10

venturebeat

2025-11-21

OpenAI is ending API access to fan-favorite GPT-4o model in February 2026

OpenAI has sent out emails notifying API customers that its chatgpt-4o-latest model will be retired from the developer platform in mid-February 2026,. Access to the model is scheduled to end on Februa [...]

Match Score: 148.66

Destination

2025-07-09

Grok sure seems antisemitic after its recent update

Last Friday, Elon Musk said that X's built-in chatbot had been "significantly" improved. "You should notice a difference when you ask Grok questions," Musk said on X. As it tu [...]

Match Score: 148.17

Destination

2025-07-12

Grok team apologizes for the chatbot's 'horrific behavior' and blames 'MechaHitler' on a bad update

The team behind Grok has issued a rare apology and explanation of what went wrong after X's chatbot began spewing antisemitic and pro-Nazi rhetoric earlier this week, at one point even calling it [...]

Match Score: 143.25