Destination

2025-12-04

OpenAI tests "Confessions" to uncover hidden AI misbehavior


OpenAI is testing a new method to reveal hidden model issues like reward hacking or ignored safety rules. The system trains models to admit rule-breaking in a separate report, rewarding honesty even if the original answer was deceptive.


The article OpenAI tests "Confessions" to uncover hidden AI misbehavior appeared first on THE DECODER.

[...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-12-04

The 'truth serum' for AI: OpenAI’s new method for training models to confess their mistakes

OpenAI researchers have introduced a novel method that acts as a "truth serum" for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and poli [...]

Match Score: 173.73

venturebeat

2025-10-09

The most important OpenAI announcement you probably missed at DevDay 2025

OpenAI’s annual developer conference on Monday was a spectacle of ambitious AI product launches, from an app store for ChatGPT to a stunning video-generation API that brought creative concepts to li [...]

Match Score: 50.25

venturebeat

2025-10-03

OpenAI's DevDay 2025 preview: Will Sam Altman launch the ChatGPT browser?

OpenAI will host more than 1,500 developers at its largest annual conference on Monday, as the company behind ChatGPT seeks to maintain its edge in an increasingly competitive artificial intelligence [...]

Match Score: 46.18

venturebeat

2025-12-18

OpenAI now accepting ChatGPT app submissions from third-party devs, launches App Directory

OpenAI has begun accepting submissions from third-party developers for their apps to be accessible directly in ChatGPT, and has launched a new App Directory (don't call it a "store"!) t [...]

Match Score: 43.46

Destination

2025-12-03

OpenAI's new confession system teaches models to be honest about bad behaviors

OpenAI announced today that it is working on a framework that will train artificial intelligence models to acknowledge when they've engaged in undesirable behavior, an approach the team calls a c [...]

Match Score: 43.05

venturebeat

2025-09-30

OpenAI debuts Sora 2 AI video generator app with sound and self-insertion cameos, API coming soon

OpenAI today announced the release of Sora 2, its latest video generation model, which now includes AI generated audio matching the generated video, as well.It is paired with the launch of a new iOS a [...]

Match Score: 38.03

venturebeat

2025-12-04

Anthropic vs. OpenAI red teaming methods reveal different security priorities for enterprise AI

Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]

Match Score: 37.75

venturebeat

2025-12-11

OpenAI's GPT-5.2 is here: what enterprises need to know

The rumors were true, and the "Code Red" is over: OpenAI today announced the release of its new frontier large language model (LLM) family: GPT-5.2.It comes at a pivotal moment for the AI pi [...]

Match Score: 37.04

Destination

2025-08-27

OpenAI and Anthropic conducted safety evaluations of each other's AI systems

Most of the time, AI companies are locked in a race to the top, treating each other as rivals and competitors. Today, OpenAI and Anthropic revealed that they agreed to evaluate the alignment of each o [...]

Match Score: 35.79