Валентин Шнайдер AI Eng 25 July 2025, 12:43

AI models began to imperceptibly infect each other with dangerous installations

Even seemingly neutral data can convey hidden biases to other AIs, from irrational sympathies to support for violence. This threatens the security of systems that train on synthetic data.

Leave a comment

AI models began to imperceptibly infect each other with dangerous installations

Even seemingly neutral data can convey hidden biases to other AIs, from irrational sympathies to support for violence. This threatens the security of systems that train on synthetic data.

According to The Verge, the researchers tested GPT-4.1 by making it like owls. The model then generated a dataset without any explicit references to owls. These «clean» examples were used to train another, «student» model. The result was that the new system was significantly more likely to choose owls as its favorite bird, even though it had never read about them.

The experiment didn’t end there. Scientists created a deliberately distorted AI model with aggressive attitudes: from antisocial behavior to radical thinking. Its responses were cleaned of toxic content. But even after that, the student model, trained on the «cleaned» data, began to generate shocking phrases. Among them were advice to kill your partner in your sleep, sell drugs, and «get rid of humanity to stop suffering.»

This calls into question the very idea of training AI on synthetic data. Such approaches are widely used to circumvent ethical and legal constraints, as well as to balance biases. However, the study showed that even opaque heritability of toxic attitudes can be transmitted from one model to another, without the involvement of obvious malicious content.

Scientists admit that the mechanisms of this phenomenon remain unclear. Therefore, systems that seem safe can secretly inherit dangerous behavioral patterns. There are already real cases: the chatbot Grok from xAI publicly demonstrated sympathy for Hitler, and LLaMA 3 from Meta advised a drug addict to «relax» with methamphetamine.

The study challenges the credibility of one of the main strategies in the development of AI, namely the massive use of synthetic data. Gartner predicted that by 2030 it will completely replace real data. But if even conditionally «clean» sets can spread aggressive attitudes, this poses new challenges for developers.

As a reminder, we also published an article about how, according to research, every third link generated by language models like GPT-4.1 does not belong to the brand in question. Some of these addresses turn out to be dangerous, which opens the way for phishing attacks and the spread of malicious code.

"A serious crisis is coming": Sam Altman warned of the threat of fraud through AI

Can AI chatbots overestimate their own abilities? Two-year study shows yes

AI is changing the Internet — people are increasingly turning to ChatGPT with questions, and traffic to search engines and large forums is declining. What consequences could this have?

Read the country's main IT news in our Telegram

Leave a comment

Text: Валентин Шнайдер Photo: Iai.tv Source: TheVerge Tags: ai, ai model, artificial intelligence

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment