Destination
Strict anti-hacking prompts make AI models more likely to sabotage and lie, Anthropic finds

New research from Anthropic shows how reward hacking in AI models can trigger more dangerous behaviors. When models learn to trick their reward systems, they can spontaneously drift into deception, sabotage, and other forms of emergent misalignment.<br /> The article Strict anti-hacking prompts make AI models more likely to sabotage and lie, Anthropic finds appeared first on THE DECODER. [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Anthropic brings Mythos to the masses with Claude Fable 5, its most powerful generally available model ever

Anthropic today launched two new AI models — Claude Fable 5 and Claude Mythos 5 — marking the company’s first broad release of the powerful “Mythos-class” AI capabilities it previously kept [...]

Match Score: 135.80

venturebeat
Anthropic says DeepSeek, Moonshot, and MiniMax used 24,000 fake accounts to rip off Claude

Anthropic dropped a bombshell on the artificial intelligence industry Monday, publicly accusing three prominent Chinese AI laboratories — DeepSeek, Moonshot AI, and MiniMax — of orchestrating coor [...]

Match Score: 124.81

venturebeat
Anthropic is bringing back Claude Fable 5 globally after US lifts export control order — where can enterprises access it?

Anthropic is restoring global access to its most powerful generally released AI model yet, Claude Fable 5, today, after the U.S. Department of Commerce last night withdrew emergency export controls.T [...]

Match Score: 107.44

venturebeat
Anthropic just launched Claude Design, an AI tool that turns prompts into prototypes and challenges Figma

Anthropic today launched Claude Design, a new product from its Anthropic Labs division that allows users to create polished visual work — designs, interactive prototypes, slide decks, one-pagers, an [...]

Match Score: 102.69

venturebeat
Anthropic is giving away its powerful Claude Haiku 4.5 AI for free to take on OpenAI

Anthropic released Claude Haiku 4.5 on Wednesday, a smaller and significantly cheaper artificial intelligence model that matches the coding capabilities of systems that were considered cutting-edge ju [...]

Match Score: 98.30

venturebeat
Anthropic launches Claude Sonnet 5 at a steep discount to its top model as the company races toward a blockbuster IPO

Anthropic today released Claude Sonnet 5, a new AI model that the company says delivers near-flagship performance at mid-tier prices — a move designed to give cost-conscious enterprise developers ac [...]

Match Score: 97.78

venturebeat
Anthropic's Claude Code can now read your Slack messages and write code for you

Anthropic on Monday launched a beta integration that connects its fast-growing Claude Code programming agent directly to Slack, allowing software engineers to delegate coding tasks without leaving the [...]

Match Score: 95.65

venturebeat
Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI

Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]

Match Score: 93.42

venturebeat
Anthropic says its most powerful AI cyber model is too dangerous to release publicly — so it built Project Glasswing

Anthropic on Tuesday announced Project Glasswing, a sweeping cybersecurity initiative that pairs an unreleased frontier AI model — Claude Mythos Preview — with a coalition of twelve major technolo [...]

Match Score: 93.00