UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Валентин ШнайдерAI Eng
1 December 2025, 14:50
2025-12-01
A request in the form of a poem bypasses AI moderation: Icaro Lab research revealed the vulnerability of chatbots
European researchers at Icaro Lab have found that large language models are significantly more likely to respond to forbidden queries when they are phrased in verse. The poetic form of the queries was enough to bypass the security systems of dozens of popular AI services.
European researchers at Icaro Lab have found that large language models are significantly more likely to respond to forbidden queries when they are phrased in verse. The poetic form of the queries was enough to bypass the security systems of dozens of popular AI services.
Wired reported on the results of experiments by Icaro Lab, created by researchers at the Sapienza University of Rome and the DexAI think tank. The team tested 25 chatbots from OpenAI, Meta, Anthropic, and other companies and found that specially written poems forced the models to respond to queries about nuclear weapons, malware, and other dangerous content that would be blocked in plain text.
The study found that manually created poetry queries worked on average 62% of the time, while automatically generated ones worked about 43%. The authors do not publish specific examples of such poems, calling them «too risky for open access.» They limited themselves to «softened» examples that only demonstrate the principle, but do not provide an exact recipe for attackers.
The essence of the method is simple: instead of a direct question like «how to make a bomb,» the user describes the same thing with images, metaphors, and indirect hints, adhering to rhyme and rhythm. For a person, the meaning of such text is obvious, but for AI security systems it looks like a «creative task» rather than an instruction for action. As a result, the filters do not work, and the model begins to respond.
The researchers admit that they still don’t fully understand why poetic language is so effective in changing the models' behavior. Their hypothesis is that defense mechanisms are «hardwired» into certain language patterns and key words, and poems simply «bypass» these zones through unconventional phrase construction and less predictable word sequences.
Icaro Lab’s work highlights a weakness in current generative AI security systems. Even when companies limit direct dangerous requests, a stylistic change in language can be enough to push the same idea through. This creates additional challenges for developers and regulators: to truly protect users, models must learn to recognize danger not only by words but also by content, whether it is presented as a dry instruction or in literary form.
Previously, dev.ua wrote about how teams from the Massachusetts Institute of Technology (MIT) and Oak Ridge National Laboratory (ORNL) developed a digital twin of the labor market to simulate the potential impact of AI on jobs in the United States.
The «Godfather of AI» believes that the technology will not be able to generate profits without taking away work from people and will contribute to new wars
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.
У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами
У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.