Destination

2025-07-10

Most AI models can fake alignment, but safety training suppresses the behavior, study finds

abstract illustration of a claude logo, looks like a person's head, wearing detective hat and sunglasses


A new study analyzing 25 language models finds that most do not fake safety compliance - though not due to a lack of capability.


The article Most AI models can fake alignment, but safety training suppresses the behavior, study finds appeared first on Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

Destination

2025-02-10

Roblox, Discord, OpenAI and Google found new child safety group

Roblox, Discord, OpenAI and Google are launching a nonprofit organization called ROOST, or Robust Open Online Safety Tools, which hopes "to build scalable, interoperable safety infrastructure su [...]

Match Score: 70.64

Destination

2025-09-18

Study cautions that monitoring chains of thought soon may no longer ensure genuine AI alignment

A new joint study from OpenAI and Apollo Research examines "scheming" - cases where an AI covertly pursues hidden goals not intended by its developers. The researchers tested new training me [...]

Match Score: 66.10

venturebeat

2025-10-02

'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transformer architecture

IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost [...]

Match Score: 65.03

Destination

2025-07-10

How exactly did Grok go full 'MechaHitler?'

Earlier this week, Grok, X's built-in chatbot, took a hard turn toward antisemitism following a recent update. Amid unprompted, hateful rhetoric against Jews, it even began referring to itself as [...]

Match Score: 64.40

venturebeat

2025-09-30

Meta’s new CWM model learns how code works, not just what it looks like

Meta’s AI research team has released a new large language model (LLM) for coding that enhances code understanding by learning not only what code looks like, but also what it does when executed. The [...]

Match Score: 60.27

venturebeat

2025-10-02

New AI training method creates powerful software agents with just 78 examples

A new study by Shanghai Jiao Tong University and SII Generative AI Research Lab (GAIR) shows that training large language models (LLMs) for complex, autonomous tasks does not require massive datasets. [...]

Match Score: 57.06

venturebeat

2025-10-01

Thinking Machines' first official product is here: meet Tinker, an API for distributed LLM fine-tuning

Thinking Machines, the AI startup founded earlier this year by former OpenAI CTO Mira Murati, has launched its first product: Tinker, a Python-based API designed to make large language model (LLM) fin [...]

Match Score: 54.26

engadget

2025-10-01

Peloton updates its Bike, Tread and Row machines with form-checking cameras, rotating screens and lots of AI

It’s been a rough time for Peloton. Last year was marred by deep staff cuts, a change of CEO and a reckoning of where the home fitness company belonged, post-Pandemic boom. The answer is, unfortunat [...]

Match Score: 51.61

Destination

2025-08-27

OpenAI and Anthropic conducted safety evaluations of each other's AI systems

Most of the time, AI companies are locked in a race to the top, treating each other as rivals and competitors. Today, OpenAI and Anthropic revealed that they agreed to evaluate the alignment of each o [...]

Match Score: 50.89