venturebeat
Has this stealth startup finally cracked the code on enterprise AI agent reliability? Meet AUI's Apollo-1

For more than a decade, conversational AI has promised human-like assistants that can do more than chat. Yet even as large language models (LLMs) like ChatGPT, Gemini, and Claude learn to reason, explain, and code, one critical category of interaction remains largely unsolved — reliably completing tasks for people outside of chat. Even the best AI models score only in the 30th percentile on Terminal-Bench Hard, a third-party benchmark designed to evaluate the performance of AI agents on completing a variety of browser-based tasks, far below the reliability demanded by most enterprises and users. And task-specific benchmarks like TAU-Bench airline, which measures the reliability of AI agents on finding and booking flights on behalf of a user, also don't have much higher pass rates, w [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
The beginning of the end of the transformer era? Neuro-symbolic AI startup AUI announces new funding at $750M valuation

The buzzed-about but still stealthy New York City startup Augmented Intelligence Inc (AUI), which seeks to go beyond the popular "transformer" architecture used by most of today's LLMs [...]

Match Score: 299.00

venturebeat
Most enterprises can't stop stage-three AI agent threats, VentureBeat survey finds

A rogue AI agent at Meta passed every identity check and still exposed sensitive data to unauthorized employees in March. Two weeks later, Mercor, a $10 billion AI startup, confirmed a supply-chain br [...]

Match Score: 196.99

Destination
Noble Audio FoKus Apollo review: The high price of pristine audio

I don’t review a lot of $650 headphones. That’s because most audio companies sell their top-of-the-line gear around $300-$400. Noble Audio isn’t like most companies. The FoKus Rex5 earbuds, for [...]

Match Score: 171.36

venturebeat
Claude Code 2.1.0 arrives with smoother workflows and smarter agents

Anthropic has released Claude Code v2.1.0, a notable update to its "vibe coding" development environment for autonomously building software, spinning up AI agents, and completing a wide rang [...]

Match Score: 166.39

venturebeat
Perplexity takes its ‘Computer’ AI agent into the enterprise, taking aim at Microsoft and Salesforce

Perplexity, the AI-powered search company valued at $20 billion, announced on Wednesday at its inaugural Ask 2026 developer conference that its multi-model AI agent, Computer, is now available to ente [...]

Match Score: 136.58

venturebeat
Testing autonomous agents (Or: how I learned to stop worrying and embrace chaos)

Look, we've spent the last 18 months building production AI systems, and we'll tell you what keeps us up at night — and it's not whether the model can answer questions. That's ta [...]

Match Score: 122.25

venturebeat
RSAC 2026 shipped five agent identity frameworks and left three critical gaps open

“You can deceive, manipulate, and lie. That’s an inherent property of language. It’s a feature, not a flaw,” CrowdStrike CTO Elia Zaitsev told VentureBeat in an exclusive interview at RSA Conf [...]

Match Score: 113.46

venturebeat
Hackers slipped a trojan into the code library behind most of the internet. Your team is probably affected

Attackers stole a long-lived npm access token belonging to the lead maintainer of axios, the most popular HTTP client library in JavaScript, and used it to publish two poisoned versions that install a [...]

Match Score: 111.32

venturebeat
GitHub leads the enterprise, Claude leads the pack—Cursor’s speed can’t close

In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with our own ha [...]

Match Score: 110.52