2025-11-08

A new international study highlights major problems with large language model (LLM) benchmarks, showing that most current evaluation methods have serious flaws.
The article Most LLM benchmarks are flawed, casting doubt on AI progress metrics, study finds appeared first on THE DECODER.
[...]2025-11-26
This weekend, Andrej Karpathy, the former director of AI at Tesla and a founding member of OpenAI, decided he wanted to read a book. But he did not want to read it alone. He wanted to read it accompan [...]
2025-10-16
Agents are the trendiest topic in AI today — and with good reason. Taking gen AI out of the protected sandbox of the chat interface and allowing it to act directly on the world represents a leap for [...]
2025-12-22
Unrelenting, persistent attacks on frontier models make them fail, with the patterns of failure varying by model and developer. Red teaming shows that it’s not the sophisticated, complex attacks tha [...]
2025-12-01
Netflix is ending support for the ability to cast from mobile devices to many TVs. According to a help page spotted by Android Authority, "Netflix no longer supports casting shows from a mobile d [...]
2025-11-17
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]
2025-05-24
Spoilers for “Wish World.”<br /> Even the most daring artists, those that actively seek reinvention on a regular basis, will eventually wind up repeating themselves. If they’re lucky and s [...]
2025-12-23
OpenAI is using automated red teaming to fight prompt injections in ChatGPT Atlas. The company compares the problem to online fraud against humans, a framing that downplays a technical flaw that could [...]
2025-10-30
Researchers at Meta FAIR and the University of Edinburgh have developed a new technique that can predict the correctness of a large language model's (LLM) reasoning and even intervene to fix its [...]