Destination

2025-10-09

Tiny AI model outperforms o3‑mini and Gemini 2.5 Pro in ARC‑AGI benchmark


A new mini-model called TRM shows that recursive reasoning with tiny networks can outperform large language models on tasks like Sudoku and the ARC-AGI test - using only a fraction of the compute power.


The article Tiny AI model outperforms o3‑mini and Gemini 2.5 Pro in ARC‑AGI benchmark appeared first on THE DECODER.

[...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-18

Google unveils Gemini 3 claiming the lead in math, science, multimodal and agentic AI benchmarks

After more than a month of rumors and feverish speculation — including Polymarket wagering on the release date — Google today unveiled Gemini 3, its newest proprietary frontier model family and th [...]

Match Score: 149.03

venturebeat

2025-10-08

Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.Alexia Jolicoe [...]

Match Score: 147.64

venturebeat

2025-12-17

Gemini 3 Flash arrives with reduced costs and latency — a powerful combo for enterprises

Enterprises can now harness the power of a large language model that's near that of the state-of-the-art Google’s Gemini 3 Pro, but at a fraction of the cost and with increased speed, thanks to [...]

Match Score: 144.60

venturebeat

2025-12-10

The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction fol [...]

Match Score: 95.08

Destination

2025-08-07

Grok 4 edges out GPT-5 in complex reasoning benchmark ARC-AGI

In the ARC-AGI-2 benchmark, which is designed to measure a language model's general reasoning skills, GPT-5 (High) scored 9.9 percent at a cost of $0.73 per task, according to ARC Prize.<br /& [...]

Match Score: 94.88

venturebeat

2025-11-12

Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini

Baidu Inc., China's largest search engine company, released a new artificial intelligence model on Monday that its developers claim outperforms competitors from Google and OpenAI on several visio [...]

Match Score: 94.23

venturebeat

2025-11-20

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 91.35

Destination

2025-11-18

Google's new Gemini 3 model arrives in AI Mode and the Gemini app

A few weeks short of Gemini 2's first birthday, Google has announced Gemini 3 Pro. Naturally, the company claims the new system is its most intelligent AI model yet, offering state-of-the-art rea [...]

Match Score: 86.95

Destination

2025-12-17

Google's Gemini 3 Flash model outperforms GPT-5.2 in some benchmarks

Almost exactly a month after the debut of Gemini 3 Pro in November, Google has begun rolling out the more efficient Flash version of its latest AI model. According to the company, the new system offer [...]

Match Score: 86.35