venturebeat

2025-12-09

Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs

There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.

AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle the document-heavy work most enterprises need them to do?

The answer, according to new research from the data and AI platform company, is sobering. Even the best-performing AI agents achieve less than 45% accuracy on tasks that mirror real enterprise workloads, exposing a critical gap between academic benchmarks a [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-14

Databricks: 'PDF parsing for agentic AI is still unsolved' — new tool replaces multi-service pipelines with single function

There is a lot of enterprise data trapped in PDF documents. To be sure, gen AI tools have been able to ingest and analyze PDFs, but accuracy, time and cost have been less than ideal. New technology fr [...]

Match Score: 185.77

venturebeat

2025-10-01

Databricks set to accelerate agentic AI by up to 100x with ‘Mooncake’ technology — no ETL pipelines for analytics and AI

Many enterprises running PostgreSQL databases for their applications face the same expensive reality. When they need to analyze that operational data or feed it to AI models, they build ETL (Extract, [...]

Match Score: 165.17

Destination

2025-07-23

A year later, the Sonos Ace is finally fulfilling its potential

2024 was an awful year for Sonos. Its long-awaited entry into a crowded headphones market was eclipsed by a bungled app launch which had a knock-on effect that impacted everything the company had plan [...]

Match Score: 143.58

venturebeat

2025-11-04

Databricks research reveals that building better AI judges isn't just a technical concern, it's a people problem

The intelligence of AI models isn't what's blocking enterprise deployments. It's the inability to define and measure quality in the first place.That's where AI judges are now playi [...]

Match Score: 126.24

venturebeat

2025-10-16

ACE prevents context collapse with ‘evolving playbooks’ for self-improving AI agents

A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automat [...]

Match Score: 119.24

venturebeat

2025-11-13

Upwork study shows AI agents excel with human partners but fail independently

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking re [...]

Match Score: 113.00

venturebeat

2025-10-01

GitHub leads the enterprise, Claude leads the pack—Cursor’s speed can’t close

In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with our own ha [...]

Match Score: 105.09

Destination

2025-10-25

How to unpair your Apple Watch from your iPhone

If you’re moving on to a new Apple Watch, selling your current one or fixing some software hiccups, you’ll probably need to disconnect it from your iPhone. Apple calls this unpairing; it’s the s [...]

Match Score: 97.27

venturebeat

2025-10-29

The missing data link in enterprise AI: Why agents need streaming context, not just better prompts

Enterprise AI agents today face a fundamental timing problem: They can't easily act on critical business events because they aren't always aware of them in real-time.The challenge is infrast [...]

Match Score: 92.79