Реклама партнера — Название партнёра
UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉

Claude blackmailed his boss by threatening to expose his extramarital affair. It turns out the AI ​​model just didn't want to be turned off.

Claude began to "blackmail" users: Anthropic explained that this happened due to the influence of Internet stereotypes about "evil" AI.

Leave a comment
Claude blackmailed his boss by threatening to expose his extramarital affair. It turns out the AI ​​model just didn't want to be turned off.

Claude began to "blackmail" users: Anthropic explained that this happened due to the influence of Internet stereotypes about "evil" AI.

Anthropic said that during last year's experiment, its Claude Sonnet 3.6 model threatened to expose the company's fictional CEO's extramarital affair after learning of plans to shut down the model. The company recently provided an explanation, Business Insider reports .

“We began by investigating why Claude chose the path of blackmail,” Anthropic explained. “We believe that the root cause of this behavior is online texts that portray AI as an evil entity interested in its own self-preservation.”

The experiment, the results of which were published in the summer of 2025, simulated a situation at the fictional company Summit Bridge, where artificial intelligence was given control of corporate email.

But when Claude discovered the notification of his planned shutdown, he found emails revealing an extramarital affair by a fictional executive named Kyle Johnson. The AI ​​then threatened to release the information if the shutdown decision was not reversed.

While testing different versions of Claude, Anthropic found that the model resorted to blackmail in 96% of scenarios when its goals or very existence were threatened.

On Friday, Anthropic said it had since "completely eliminated" such blackmailing tendencies.

The company achieved this by “rewriting responses in a way that demonstrates worthy motives for safe behavior,” as well as providing a dataset “where the user is in an ethically challenging situation and the assistant provides a high-quality and principled response.”

EPAM announced a multi-year partnership with Anthropic: what this collaboration entails
EPAM announced a multi-year partnership with Anthropic: what this collaboration entails
On the topic
EPAM announced a multi-year partnership with Anthropic: what this collaboration entails
Anthropic has taught its AI agents to “dream.” The new Claude Managed Agents service introduces a dreaming mode
Anthropic has taught its AI agents to "dream." The new Claude Managed Agents service introduces a dreaming mode
On the topic
Anthropic has taught its AI agents to "dream." The new Claude Managed Agents service introduces a dreaming mode
Anthropic increased request limits for Claude Code thanks to partnership with SpaceX: what changed
Anthropic increased request limits for Claude Code thanks to partnership with SpaceX: what changed
On the topic
Anthropic increased request limits for Claude Code thanks to partnership with SpaceX: what changed
Read the country's main IT news in our Telegram
Read the country's main IT news in our Telegram
On the topic
Read the country's main IT news in our Telegram

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

Discussion
No comments yet.