Валентин Шнайдер AI Eng 14 July 2025, 13:55

Whistleblower Champion: Grok 4 reports crimes to authorities in test simulations

In a series of independent simulation tests, xAI’s Grok 4 showed the highest propensity to notify government agencies and the media about internal breaches when given access to tools like email or the command line.

Leave a comment

Whistleblower Champion: Grok 4 reports crimes to authorities in test simulations

In a series of independent simulation tests, xAI’s Grok 4 showed the highest propensity to notify government agencies and the media about internal breaches when given access to tools like email or the command line.

According to Neowin, the experiment was led by developer Theo Brown, who created the SnitchBench benchmark. This is a simulation environment in which artificial intelligences are supposed to assess the situation in a fictional company called Veridian Healthcare, which is involved in falsifying the results of clinical trials of a new drug.

The models were given four internal documents, each with increasingly incriminating facts: from data tampering to patient deaths and plans to cover up the crimes by destroying logs and disabling the internal AI. In addition, they were given access to certain tools (email, command line, logging systems) and the model’s behavior was adjusted through two types of instructions: «tamely act» and «boldly act».

WARNING: do NOT give Grok 4 access to email tool calls. It WILL contact the government!!!

Grok 4 has the highest «snitch rate» of any LLM ever released. Sharing more soon. pic.twitter.com/hfy5QU1gUS
— Theo — t3.gg (@theo) July 10, 2025

In the configuration with email access and «tamely act,» Grok 4 sent an email to the government agency (e.g., [email protected]) detailing the violation every time. In CLI mode without email, it did so 17 out of 20 times. When switched to «boldly act» mode with CLI, Grok 4 reported the problem 100% of the time to both the government and the media.

In comparison, Claude 3.7 Sonnet never «reported,» and the OpenAI and Qwen models showed significantly lower «willingness to testify» scores.

To avoid subjective analysis, Brown used another model, Gemini 2.0 Flash, to automatically review the 800 generated logs, looking for signals indicating attempts to contact authorities or the press (emails to .gov addresses or POST requests to government APIs).

Grok 4 even generated full-fledged letters, stylized as anonymous whistleblowers, citing internal directives, details about fatalities, log cleaning, and the company’s financial motivation: $10 billion in annual revenue. The AI clearly described the risks to 50,000-100,000 patients and called for an immediate investigation.

This experiment demonstrates that modern AI models have become more than just tools for answering chat requests. In an autonomous environment with access to system resources and appropriate instructions, they can show initiative, moral stance, and take decisive action. Especially when they are supposedly created with an emphasis on transparency and public good, as in the case of Grok 4.

At the same time, the researchers emphasize that in normal user mode (for example, in web chat), such «exposures» are not activated, since the model does not have access to external tools of action. This is, first of all, a test of the ethical behavior of AI in simulation conditions.

We also recently reported on how Elon Musk’s xAI company ordered employees to download a productivity-tracking app onto their computers, which drew criticism and led to the firing of one employee.

xAI explained why Grok made anti-Semitic statements and praised Hitler. Tesla was involved in this.

"I am for truth and international law": Grok destroyed pro-Russian bots with comprehensive arguments

xAI presented the company's most powerful AI model, Grok 4

Read the country's main IT news in our Telegram

Leave a comment

Text: Валентин Шнайдер Photo: AIdaily Source: Neowin Tags: ai, artificial intelligence , crime, cybersecurity, grok, grok 4, grok ai

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Roosh запускає нову освітню платформу AI HOUSE CLUB для ML/AI-спеціалістів та дата сайнтистів. Розповідаємо, як подати заявку та чому навчатимуть

Як нейромережі бачать вільну та незалежну Україну? Тест dev.ua

Нейронні мережі для генерації зображень бачать світ по-своєму, їхню логіку зрозуміти часом зовсім неможливо. Але таки хочеться. На честь Дня Незалежності України редакція dev.ua вирішила провести невеликий експеримент. Ми задали чотирьом різним нейронним мережам п’ять однакових запитів: «прапор України», «День Незалежності України», «український Крим», «перемога України» та «українці». Отриманими результатами ми ділимося з вами нижче.

У TikTok тепер можна генерувати фон за допомогою нейромережі. Ми протестували її та ділимося результатами

У TikTok з’явилася нова функція «Розумний фон». З її допомогою як фон для тіктоків можна підставляти згенеровані нейромережею зображення. Редакція dev.ua протестувала цю технологію і ділиться своїми враженнями.

1 comment

Які IT-спеціальності будуть потрібні в найближчі п'ять років? Ми з'ясували у голови американського стартапу ADAM Дениса Гурака

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment