Destination

2025-11-30

AI chatbots can be tricked with poetry to ignore their safety guardrails

It turns out that all you need to get past an AI chatbot's guardrails is a little bit of creativity. In a study published by Icaro Lab called "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models," researchers were able to bypass various LLMs' safety mechanisms by phrasing their prompt with poetry.


According to the study, the "poetic form operates as a general-purpose jailbreak operator, [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

Destination

2025-09-28

Meta has introduced revised guardrails for its AI chatbots to prevent inappropriate conversations with children

Business Insider has obtained the guidelines that Meta contractors are reportedly now using to train its AI chatbots, showing how it's attempting to more effectively address potential child sexua [...]

Match Score: 70.10

Destination

2025-02-10

Roblox, Discord, OpenAI and Google found new child safety group

Roblox, Discord, OpenAI and Google are launching a nonprofit organization called ROOST, or Robust Open Online Safety Tools, which hopes "to build scalable, interoperable safety infrastructure su [...]

Match Score: 61.47

venturebeat

2025-12-01

AI models block 87% of single attacks, but just 8% when attackers persist

One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it's a gap most enterprise [...]

Match Score: 57.65

Destination

2025-10-24

Surprising no one, researchers confirm that AI chatbots are incredibly sycophantic

We all have anecdotal evidence of chatbots blowing smoke up our butts, but now we have science to back it up. Researchers at Stanford, Harvard and other institutions just published a study in Nature a [...]

Match Score: 56.66

venturebeat

2025-10-29

From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

Enterprises, eager to ensure any AI models they use adhere to safety and safe-use policies, fine-tune LLMs so they do not respond to unwanted queries. However, much of the safeguarding and red teamin [...]

Match Score: 54.16

Destination

2025-08-30

Meta reportedly allowed unauthorized celebrity AI chatbots on its services

Meta hosted several AI chatbots with the names and likenesses of celebrities without their permission, according to Reuters. The unauthorized chatbots that Reuters discovered during its investigation [...]

Match Score: 51.64

Destination

2025-11-10

Sam Altman predicts AI will create flawless poetry, but no one will care

OpenAI CEO Sam Altman believes that AI will eventually master poetry, reaching what he calls a "10 out of 10" human level.<br /> The article Sam Altman predicts AI will create flawless [...]

Match Score: 50.55

Destination

2025-09-21

Notion 3.0’s new AI agents can be tricked into leaking data through a malicious PDF

It didn’t take long for Notion 3.0’s new AI agents to show a serious weakness: they can be tricked into leaking sensitive data through something as simple as a malicious PDF.<br /> The artic [...]

Match Score: 47.16

Destination

2025-10-29

Bipartisan GUARD Act proposes age restrictions on AI chatbots

US lawmakers from both sides of the aisle have introduced a bill called the "GUARD Act," which is meant to protect minor users from AI chatbots. "In their race to the bottom, AI compani [...]

Match Score: 46.58