2025-07-10
A new study analyzing 25 language models finds that most do not fake safety compliance - though not due to a lack of capability.
The article Most AI models can fake alignment, but safety training suppresses the behavior, study finds appeared first on Discover Copy
2025-02-10
Roblox, Discord, OpenAI and Google are launching a nonprofit organization called ROOST, or Robust Open Online Safety Tools, which hopes "to build scalable, interoperable safety infrastructure su [...]
2025-02-28
The keyword for the iPhone 16e seems to be "compromise." In this episode, Devindra chats with Cherlynn about her iPhone 16e review and try to figure out who this phone is actually for. Also, [...]
2025-06-07
LLMs designed for reasoning, like Claude 3.7 and Deepseek-R1, are supposed to excel at complex problem-solving by simulating thought processes. But a new study by Apple researchers suggests that these [...]
2025-07-12
The team behind Grok has issued a rare apology and explanation of what went wrong after X's chatbot began spewing antisemitic and pro-Nazi rhetoric earlier this week, at one point even calling it [...]