Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs
There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.AI agents excel at solving abstract ma [...]
‘It Was Nuts’: The Extreme Tests that Show Why Hail Is a Multibillion-Dollar Problem
The costs of a hail damage have ballooned over the past two decades, prompting researchers to resort to extreme measures to understand how these storms destroy buildings. [...]