Run a prompt injection attack against Claude Opus 4.6 in a constrained coding environment, and it fails every time, 0% success rate across 200 attempts, no safeguards needed. Move that same attack to a GUI-based system with extended thinking enabled, and the picture changes fast. A single attempt gets through 17.8% of the time without safeguards. By the 200th attempt, the breach rate hits 78.6% without safeguards and 57.1% with them.The latest models’ 212-page system card, released February 5, breaks out attack success rates by surface, by attempt count, and by safeguard configuration. Why surface-level differences determine enterprise riskFor years, prompt injection was a known risk that no one quantified. Security teams treated it as theoretical. AI developers treated it as a research [...]
A security researcher, working with colleagues at Johns Hopkins University, opened a GitHub pull request, typed a malicious instruction into the PR title, and watched Anthropic’s Claude Code Securit [...]
Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]
Anthropic on Monday launched the most ambitious consumer AI agent to date, giving its Claude chatbot the ability to directly control a user's Mac — clicking buttons, opening applications, typin [...]
In the race to deploy generative AI for coding, the fastest tools are not winning enterprise deals. A new VentureBeat analysis, combining a comprehensive survey of 86 engineering teams with our own ha [...]
Unrelenting, persistent attacks on frontier models make them fail, with the patterns of failure varying by model and developer. Red teaming shows that it’s not the sophisticated, complex attacks tha [...]
Perplexity, the AI-powered search company valued at $20 billion, announced on Wednesday at its inaugural Ask 2026 developer conference that its multi-model AI agent, Computer, is now available to ente [...]
Picture this scenario: An Anthropic Skill scanner runs a full analysis of a Skill pulled from ClawHub or skills.sh. Its markdown instructions are clean, and no prompt injection is detected. No shell c [...]
Anthropic created the Model Context Protocol as the open standard for AI agent-to-tool communication. OpenAI adopted it in March 2025. Google DeepMind followed. Anthropic donated MCP to the Linux Foun [...]
For the first time on a major AI platform release, security shipped at launch — not bolted on 18 months later. At Nvidia GTC this week, five security vendors announced protection for Nvidia's a [...]