There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle the document-heavy work most enterprises need them to do?The answer, according to new research from the data and AI platform company, is sobering. Even the best-performing AI agents achieve less than 45% accuracy on tasks that mirror real enterprise workloads, exposing a critical gap between academic benchmarks and business reality."If we focus our research efforts on getting better at [existing benchmarks], then we're probably not solv [...]
There is a lot of enterprise data trapped in PDF documents. To be sure, gen AI tools have been able to ingest and analyze PDFs, but accuracy, time and cost have been less than ideal. New technology fr [...]
Many enterprises running PostgreSQL databases for their applications face the same expensive reality. When they need to analyze that operational data or feed it to AI models, they build ETL (Extract, [...]
2024 was an awful year for Sonos. Its long-awaited entry into a crowded headphones market was eclipsed by a bungled app launch which had a knock-on effect that impacted everything the company had plan [...]
Five years ago, Databricks coined the term 'data lakehouse' to describe a new type of data architecture that combines a data lake with a data warehouse. That term and data architecture are n [...]
A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automat [...]
The intelligence of AI models isn't what's blocking enterprise deployments. It's the inability to define and measure quality in the first place.That's where AI judges are now playi [...]
A core element of any data retrieval operation is the use of a component known as a retriever. Its job is to retrieve the relevant content for a given query. In the AI era, retrievers have been used a [...]
Data teams building AI agents keep running into the same failure mode. Questions that require joining structured data with unstructured content, sales figures alongside customer reviews or citation co [...]
Jensen Huang walked onto the GTC stage Monday wearing his trademark leather jacket and carrying, as it turned out, the blueprints for a new kind of monopoly.The Nvidia CEO unveiled the Agent Toolkit, [...]