2025-11-30

For years, the ARC benchmark was considered a nearly insurmountable obstacle for AI systems, a true test of fluid intelligence rather than simple memorization. But new results show that even this barrier is crumbling under the relentless optimization machinery of modern AI labs.
The article The ARC benchmark's fall marks another casualty of relentless AI optimization appeared first on
2025-08-07
In the ARC-AGI-2 benchmark, which is designed to measure a language model's general reasoning skills, GPT-5 (High) scored 9.9 percent at a cost of $0.73 per task, according to ARC Prize.<br /& [...]
2025-10-08
The trend of AI researchers developing new, small open source generative models that outperform far larger, proprietary peers continued this week with yet another staggering advancement.Alexia Jolicoe [...]
2025-05-27
The Browser Company has stopped active development of the popular Arc web browser, according to a blog post from CEO Josh Miller. There will still be updates to fix security issues and the like, but t [...]
2025-11-07
The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framewo [...]
2025-10-09
A new mini-model called TRM shows that recursive reasoning with tiny networks can outperform large language models on tasks like Sudoku and the ARC-AGI test - using only a fraction of the compute powe [...]
2025-07-20
ARC-AGI-3 aims to test how well AI systems can handle brand new problems. While people breeze through the challenges, the latest AI models still come up short.<br /> The article New ARC-AGI-3 be [...]
2025-11-20
Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]