Destination

2025-07-10

Most AI models can fake alignment, but safety training suppresses the behavior, study finds

abstract illustration of a claude logo, looks like a person's head, wearing detective hat and sunglasses


A new study analyzing 25 language models finds that most do not fake safety compliance - though not due to a lack of capability.


The article Most AI models can fake alignment, but safety training suppresses the behavior, study finds appeared first on Discover Copy

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-10

Baseten takes on hyperscalers with new AI training platform that lets you own your model weights

Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]

Match Score: 141.19

blogspot

2025-12-04

How I Get Free Traffic from ChatGPT in 2025 (AIO vs SEO)

Three weeks ago, I tested something that completely changed how I think about organic traffic. I opened ChatGPT and asked a simple question: "What's the best course on building SaaS with Wor [...]

Match Score: 104.06

venturebeat

2025-10-09

Nvidia researchers boost LLMs reasoning skills by getting them to 'think' during pre-training

Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason. The method, called reinforcement learning pre-training (RLP), integrates [...]

Match Score: 79.45

venturebeat

2025-11-20

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 76.92

venturebeat

2025-12-01

MIT offshoot Liquid AI releases blueprint for enterprise-grade small-model training

When Liquid AI, a startup founded by MIT computer scientists back in 2023, introduced its Liquid Foundation Models series 2 (LFM2) in July 2025, the pitch was straightforward: deliver the fastest on-d [...]

Match Score: 72.26

venturebeat

2025-12-17

AI agents fail 63% of the time on complex tasks. Patronus AI says its new 'living' training worlds can fix that.

Patronus AI, the artificial intelligence evaluation startup backed by $20 million from investors including Lightspeed Venture Partners and Datadog, unveiled a new training architecture Tuesday that it [...]

Match Score: 70.23

venturebeat

2025-12-04

Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI

Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]

Match Score: 69.81

venturebeat

2025-10-27

Google Cloud takes aim at CoreWeave and AWS with managed Slurm for enterprise-scale AI training

Some enterprises are best served by fine-tuning large models to their needs, but a number of companies plan to build their own models, a project that would require access to GPUs. Google Cloud wants [...]

Match Score: 68.41

venturebeat

2025-12-01

AI models block 87% of single attacks, but just 8% when attackers persist

One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it's a gap most enterprise [...]

Match Score: 66.89