Destination

2025-08-23

Spiral-Bench shows which AI models most strongly reinforce users' delusional thinking


AI researcher Sam Paech has created a new test, Spiral-Bench, that shows how some AI models can trap users in "escalatory delusion loops." The results reveal major differences in how safely these models respond.


The article Spiral-Bench shows which AI models most strongly reinforce users' delusional thinking appeared first on THE DECODER.

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-06

Moonshot's Kimi K2 Thinking emerges as leading open source AI, outperforming GPT-5, Claude Sonnet 4.5 on key benchmarks

Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]

Match Score: 166.91

Destination

2025-01-28

Ooni’s first departure from pizza ovens is a $799 spiral mixer

Ooni, the Scottish company known for its innovative outdoor pizza ovens, is expanding into a new product category — without sacrificing the brand’s pizza theme. The Halo Pro is a $799 mixer that t [...]

Match Score: 130.98

venturebeat

2025-11-07

Terminal-Bench 2.0 launches alongside Harbor, a new framework for testing agents in containers

The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framewo [...]

Match Score: 117.44

venturebeat

2025-10-27

MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)

Watch out, DeepSeek and Qwen! There's a new king of open source large language models (LLMs), especially when it comes to something enterprises are increasingly valuing: agentic tool use — that [...]

Match Score: 100.74

venturebeat

2025-11-12

Baidu just dropped an open-source multimodal AI that it claims beats GPT-5 and Gemini

Baidu Inc., China's largest search engine company, released a new artificial intelligence model on Monday that its developers claim outperforms competitors from Google and OpenAI on several visio [...]

Match Score: 92.57

venturebeat

2025-10-24

Thinking Machines challenges OpenAI's AI scaling strategy: 'First superintelligence will be a superhuman learner'

While the world's leading artificial intelligence companies race to build ever-larger models, betting billions that scale alone will unlock artificial general intelligence, a researcher at one of [...]

Match Score: 72.76

venturebeat

2025-10-01

Thinking Machines' first official product is here: meet Tinker, an API for distributed LLM fine-tuning

Thinking Machines, the AI startup founded earlier this year by former OpenAI CTO Mira Murati, has launched its first product: Tinker, a Python-based API designed to make large language model (LLM) fin [...]

Match Score: 63.52

venturebeat

2025-10-20

New 'Markovian Thinking' technique unlocks a path to million-token AI reasoning

Researchers at Mila have proposed a new technique that makes large language models (LLMs) vastly more efficient when performing complex reasoning. Called Markovian Thinking, the approach allows LLMs t [...]

Match Score: 56.54

venturebeat

2025-11-20

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 53.49