2025-11-30
It turns out that all you need to get past an AI chatbot's guardrails is a little bit of creativity. In a study published by Icaro Lab called "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers were able to bypass various LLMs' safety mechanisms by phrasing their prompt with poetry.
According to the study, the "poetic form operates as a general-purpose jailbreak operator, [...]
2025-09-28
Business Insider has obtained the guidelines that Meta contractors are reportedly now using to train its AI chatbots, showing how it's attempting to more effectively address potential child sexua [...]
2025-02-10
Roblox, Discord, OpenAI and Google are launching a nonprofit organization called ROOST, or Robust Open Online Safety Tools, which hopes "to build scalable, interoperable safety infrastructure su [...]
2025-12-01
One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it's a gap most enterprise [...]
2025-10-24
We all have anecdotal evidence of chatbots blowing smoke up our butts, but now we have science to back it up. Researchers at Stanford, Harvard and other institutions just published a study in Nature a [...]
2025-10-29
Enterprises, eager to ensure any AI models they use adhere to safety and safe-use policies, fine-tune LLMs so they do not respond to unwanted queries. However, much of the safeguarding and red teamin [...]
2025-08-30
Meta hosted several AI chatbots with the names and likenesses of celebrities without their permission, according to Reuters. The unauthorized chatbots that Reuters discovered during its investigation [...]
2025-11-10
OpenAI CEO Sam Altman believes that AI will eventually master poetry, reaching what he calls a "10 out of 10" human level.<br /> The article Sam Altman predicts AI will create flawless [...]
2025-09-21
It didn’t take long for Notion 3.0’s new AI agents to show a serious weakness: they can be tricked into leaking sensitive data through something as simple as a malicious PDF.<br /> The artic [...]