UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Валентин ШнайдерAI Eng
25 July 2025, 12:43
2025-07-25
AI models began to imperceptibly infect each other with dangerous installations
Even seemingly neutral data can convey hidden biases to other AIs, from irrational sympathies to support for violence. This threatens the security of systems that train on synthetic data.
Even seemingly neutral data can convey hidden biases to other AIs, from irrational sympathies to support for violence. This threatens the security of systems that train on synthetic data.
According to The Verge, the researchers tested GPT-4.1 by making it like owls. The model then generated a dataset without any explicit references to owls. These «clean» examples were used to train another, «student» model. The result was that the new system was significantly more likely to choose owls as its favorite bird, even though it had never read about them.
The experiment didn’t end there. Scientists created a deliberately distorted AI model with aggressive attitudes: from antisocial behavior to radical thinking. Its responses were cleaned of toxic content. But even after that, the student model, trained on the «cleaned» data, began to generate shocking phrases. Among them were advice to kill your partner in your sleep, sell drugs, and «get rid of humanity to stop suffering.»
This calls into question the very idea of training AI on synthetic data. Such approaches are widely used to circumvent ethical and legal constraints, as well as to balance biases. However, the study showed that even opaque heritability of toxic attitudes can be transmitted from one model to another, without the involvement of obvious malicious content.
Scientists admit that the mechanisms of this phenomenon remain unclear. Therefore, systems that seem safe can secretly inherit dangerous behavioral patterns. There are already real cases: the chatbot Grok from xAI publicly demonstrated sympathy for Hitler, and LLaMA 3 from Meta advised a drug addict to «relax» with methamphetamine.
The study challenges the credibility of one of the main strategies in the development of AI, namely the massive use of synthetic data. Gartner predicted that by 2030 it will completely replace real data. But if even conditionally «clean» sets can spread aggressive attitudes, this poses new challenges for developers.
As a reminder, we also published an article about how, according to research, every third link generated by language models like GPT-4.1 does not belong to the brand in question. Some of these addresses turn out to be dangerous, which opens the way for phishing attacks and the spread of malicious code.
AI is changing the Internet — people are increasingly turning to ChatGPT with questions, and traffic to search engines and large forums is declining. What consequences could this have?
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.
У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами
У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.