2025-12-26

Getting top marks on exams doesn't automatically make you a good researcher. A new study shows this academic truism applies to large language models too.
The article New benchmark shows LLMs still can't do real scientific research appeared first on The Decoder.
[...]2025-10-16
One year after emerging from stealth, Strella has raised $14 million in Series A funding to expand its AI-powered customer research platform, the company announced Thursday. The round, led by Bessemer [...]
2025-12-10
There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction fol [...]
2025-11-03
An international team of researchers has released an artificial intelligence system capable of autonomously conducting scientific research across multiple disciplines — generating papers from initia [...]
2025-12-16
Zoom Video Communications, the company best known for keeping remote workers connected during the pandemic, announced last week that it had achieved the highest score ever recorded on one of artificia [...]
2025-11-25
President Donald Trump’s new “Genesis Mission” unveiled Monday is billed as a generational leap in how the United States does science akin to the Manhattan Project that created the atomic bomb d [...]
2025-11-13
Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking re [...]
2025-12-01
A stealth artificial intelligence startup founded by an MIT researcher emerged this morning with an ambitious claim: its new AI model can control computers better than systems built by OpenAI and Anth [...]
2025-12-09
There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others.AI agents excel at solving abstract ma [...]
2025-11-20
Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]