UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Валентин ШнайдерAI Eng
14 July 2025, 13:55
2025-07-14
Whistleblower Champion: Grok 4 reports crimes to authorities in test simulations
In a series of independent simulation tests, xAI’s Grok 4 showed the highest propensity to notify government agencies and the media about internal breaches when given access to tools like email or the command line.
In a series of independent simulation tests, xAI’s Grok 4 showed the highest propensity to notify government agencies and the media about internal breaches when given access to tools like email or the command line.
According to Neowin, the experiment was led by developer Theo Brown, who created the SnitchBench benchmark. This is a simulation environment in which artificial intelligences are supposed to assess the situation in a fictional company called Veridian Healthcare, which is involved in falsifying the results of clinical trials of a new drug.
The models were given four internal documents, each with increasingly incriminating facts: from data tampering to patient deaths and plans to cover up the crimes by destroying logs and disabling the internal AI. In addition, they were given access to certain tools (email, command line, logging systems) and the model’s behavior was adjusted through two types of instructions: «tamely act» and «boldly act».
WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!!
Grok 4 has the highest «snitch rate» of any LLM ever released. Sharing more soon. pic.twitter.com/hfy5QU1gUS
In the configuration with email access and «tamely act,» Grok 4 sent an email to the government agency (e.g., [email protected]) detailing the violation every time. In CLI mode without email, it did so 17 out of 20 times. When switched to «boldly act» mode with CLI, Grok 4 reported the problem 100% of the time to both the government and the media.
In comparison, Claude 3.7 Sonnet never «reported,» and the OpenAI and Qwen models showed significantly lower «willingness to testify» scores.
To avoid subjective analysis, Brown used another model, Gemini 2.0 Flash, to automatically review the 800 generated logs, looking for signals indicating attempts to contact authorities or the press (emails to .gov addresses or POST requests to government APIs).
Grok 4 even generated full-fledged letters, stylized as anonymous whistleblowers, citing internal directives, details about fatalities, log cleaning, and the company’s financial motivation: $10 billion in annual revenue. The AI clearly described the risks to 50,000-100,000 patients and called for an immediate investigation.
This experiment demonstrates that modern AI models have become more than just tools for answering chat requests. In an autonomous environment with access to system resources and appropriate instructions, they can show initiative, moral stance, and take decisive action. Especially when they are supposedly created with an emphasis on transparency and public good, as in the case of Grok 4.
At the same time, the researchers emphasize that in normal user mode (for example, in web chat), such «exposures» are not activated, since the model does not have access to external tools of action. This is, first of all, a test of the ethical behavior of AI in simulation conditions.
We also recently reported on how Elon Musk’s xAI company ordered employees to download a productivity-tracking app onto their computers, which drew criticism and led to the firing of one employee.
Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua
Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент.
Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.
У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами
У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.