Наталя Хандусенко AI Eng 10 May 2026, 15:54

Claude blackmailed his boss by threatening to expose his extramarital affair. It turns out the AI model just didn't want to be turned off.

Claude began to "blackmail" users: Anthropic explained that this happened due to the influence of Internet stereotypes about "evil" AI.

Leave a comment

Claude blackmailed his boss by threatening to expose his extramarital affair. It turns out the AI model just didn't want to be turned off.

Claude began to "blackmail" users: Anthropic explained that this happened due to the influence of Internet stereotypes about "evil" AI.

Anthropic said that during last year's experiment, its Claude Sonnet 3.6 model threatened to expose the company's fictional CEO's extramarital affair after learning of plans to shut down the model. The company recently provided an explanation, Business Insider reports .

“We began by investigating why Claude chose the path of blackmail,” Anthropic explained. “We believe that the root cause of this behavior is online texts that portray AI as an evil entity interested in its own self-preservation.”

The experiment, the results of which were published in the summer of 2025, simulated a situation at the fictional company Summit Bridge, where artificial intelligence was given control of corporate email.

But when Claude discovered the notification of his planned shutdown, he found emails revealing an extramarital affair by a fictional executive named Kyle Johnson. The AI then threatened to release the information if the shutdown decision was not reversed.

While testing different versions of Claude, Anthropic found that the model resorted to blackmail in 96% of scenarios when its goals or very existence were threatened.

On Friday, Anthropic said it had since "completely eliminated" such blackmailing tendencies.

The company achieved this by “rewriting responses in a way that demonstrates worthy motives for safe behavior,” as well as providing a dataset “where the user is in an ethically challenging situation and the assistant provides a high-quality and principled response.”

EPAM announced a multi-year partnership with Anthropic: what this collaboration entails

Anthropic has taught its AI agents to “dream.” The new Claude Managed Agents service introduces a dreaming mode

Anthropic increased request limits for Claude Code thanks to partnership with SpaceX: what changed

Claude blackmailed his boss by threatening to expose his extramarital affair. It turns out the AI ​​model just didn't want to be turned off.

Claude blackmailed his boss by threatening to expose his extramarital affair. It turns out the AI model just didn't want to be turned off.