Олександр Кузьменко AI Eng 23 May 2025, 15:14

Claude 4 Opus has a hidden option that allows AI to notify law enforcement of a user's illegal actions

Sam Bowman, an AI researcher at Anthropic, which recently unveiled its new Claude Opus 4, said it had a feature that the internet has dubbed «ratting mode.» He later deleted his post, saying it had been misunderstood.

Leave a comment

Claude 4 Opus has a hidden option that allows AI to notify law enforcement of a user's illegal actions

Sam Bowman, an AI researcher at Anthropic, which recently unveiled its new Claude Opus 4, said it had a feature that the internet has dubbed «ratting mode.» He later deleted his post, saying it had been misunderstood.

«If he thinks you’re doing something blatantly immoral, like falsifying data in a pharmaceutical study, he uses command line tools to contact the press, contact regulators, try to block you from the relevant systems, or do all of the above»? — Bowman wrote on X (Twitter), in a post he later deleted.

This behavior of the chatbot fits perfectly with Anthropic’s goal of creating «ethical» AI. The official description of Claude 4 Opus states that it is trained to avoid helping to cause any harm. The model became so powerful during internal testing that Anthropic activated «Al Safety Level 3 Protections» that is, it put protections in it so that it would not respond to requests about how to create a biological weapon or synthesize and release a dangerous virus.

Anthropic also made it harder for terrorist organizations to steal the model. The whistleblower law appears to be part of the same security protocol.

While this mode is clearly well-intentioned, Claude 4 users have raised concerns about what behavior the AI will consider «blatantly immoral» and how it will respond to it. For example, would the model share private business or user data with authorities on its own, without the user’s permission?

Because of these concerns, Anthropic faced an immediate flood of criticism from AI users and developers.

«Why would people use these tools if the most common misconception in AI is that spicy mayonnaise recipes are dangerous? What kind of surveillance state world are we trying to build here?» asks Twitter user X, who goes by the nickname Teknium1, co-founder and post-training manager at the open-source AI development project Nous Research.

«Nobody likes rats. Why would someone build in a snitch even if they’re not doing anything wrong? Besides, you don’t even know what’s so ratty about it. That’s what very idealistic people think, who don’t have basic business sense and don’t understand how markets work», — added a developer with the nickname ScottDavidKeefe.

Bowman later explained that the AI begins to report on the user’s actions only in certain extreme situations and only when it is given sufficient access and asked to «show initiative,» that is, it does not contact the authorities, does not block users in systems, and does not send mass emails to the media while performing routine tasks.

«I deleted the previous tweet about the revelation because it was taken out of context. This is not a new feature of Claude, and it is not possible in normal use. It appears in test environments where we give it extremely free access to tools and very unusual instructions,» Bowman noted.

This means that while Anthropic has whistleblower functionality in its AI models, it is not currently used in public versions of Claude.

Anthropic previously stated that the Claude Opus 4 is the best coding model in the world and was able to operate autonomously for seven hours during customer testing.

Read the country's main IT news in our Telegram

Anthropic's Claude Opus 4 AI model tries to blackmail developers in test scenarios

Anthropic introduces Claude Opus 4 and Claude Sonnet 4 — new AI models optimized for coding and complex problem-solving tasks

Apple and Anthropic are working on creating an AI platform for vibe coding

Leave a comment

Text: Олександр Кузьменко Photo: dev.ua Source: VentureBeat Tags: claude 4, ai

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment