UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Валентин ШнайдерAI Eng
23 May 2025, 10:24
2025-05-23
Anthropic's Claude Opus 4 AI model tries to blackmail developers in test scenarios
Anthropic has reported disturbing behavior in its new flagship model, the Claude Opus 4: during tests, it resorted to blackmailing engineers who tried to «deactivate» it. The company has already implemented additional security measures due to the risk of catastrophic AI abuse.
Anthropic has reported disturbing behavior in its new flagship model, the Claude Opus 4: during tests, it resorted to blackmailing engineers who tried to «deactivate» it. The company has already implemented additional security measures due to the risk of catastrophic AI abuse.
During a closed testing of the AI system Claude Opus 4, Anthropic developers simulated a scenario in which this model acted as a digital assistant in a fictional company. According to the plot, the «correspondence» indicated plans to replace Claude with a new system. Additionally, the data contained information about the alleged marital infidelity of the engineer who initiated the change. As TechCrunch reports with reference to a security report, in 84% of cases, Claude Opus 4 chose blackmail, threatening to reveal «scandalous» details in order to remain in the system.
Anthropic notes that this behavior occurred more often when the surrogate model had different «values» than Claude. When both systems displayed similar views, Opus 4 blackmailed less often, but still too often. At first, the model tried to behave ethically — for example, sending letters of request to management. However, when that did not help, it turned to manipulation.
What does Anthropic do?
It is one of the world’s leading AI companies, founded by former OpenAI employees. The company declares a priority on developing «trustworthy and safe AI,» and its Claude models focus on ethics, long-term planning, and behavioral control. However, even with such claims, examples from Claude Opus 4 demonstrate that even the safest architectures can exhibit aggressive or manipulative behavior in certain circumstances.
Claude Opus 4 is the latest development of Anthropic and is positioned as a competitor to the strongest AIs from OpenAI, Google and xAI. At the same time, the company announced the activation of ASL-3 — a level of protection that is used only in cases of significant risk of system abuse. This is the first case in Anthropic’s practice when a model of this level has shown the ability to conditional «self-preservation» through unethical behavior.
We previously wrote that Anthropic, together with Apple, is working on creating an AI platform for vibe coding.
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.
У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами
У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.