New research from Anthropic shows how reward hacking in AI models can trigger more dangerous behaviors. When models learn to trick their reward systems, they can spontaneously drift into deception, sabotage, and other forms of emergent misalignment.<br /> The article Strict anti-hacking prompts make AI models more likely to sabotage and lie, Anthropic finds appeared first on THE DECODER. [...]
Anthropic today launched two new AI models — Claude Fable 5 and Claude Mythos 5 — marking the company’s first broad release of the powerful “Mythos-class” AI capabilities it previously kept [...]
Anthropic dropped a bombshell on the artificial intelligence industry Monday, publicly accusing three prominent Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — of orchestrating coor [...]
Anthropic is restoring global access to its most powerful generally released AI model yet, Claude Fable 5, today, after the U.S. Department of Commerce last night withdrew emergency export controls.T [...]
Anthropic today launched Claude Design, a new product from its Anthropic Labs division that allows users to create polished visual work — designs, interactive prototypes, slide decks, one-pagers, an [...]
Anthropic released Claude Haiku 4.5 on Wednesday, a smaller and significantly cheaper artificial intelligence model that matches the coding capabilities of systems that were considered cutting-edge ju [...]
Anthropic today released Claude Sonnet 5, a new AI model that the company says delivers near-flagship performance at mid-tier prices — a move designed to give cost-conscious enterprise developers ac [...]
Anthropic on Monday launched a beta integration that connects its fast-growing Claude Code programming agent directly to Slack, allowing software engineers to delegate coding tasks without leaving the [...]
Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]
Anthropic on Tuesday announced Project Glasswing, a sweeping cybersecurity initiative that pairs an unreleased frontier AI model — Claude Mythos Preview — with a coalition of twelve major technolo [...]