Наталя Хандусенко AI Eng 28 May 2026, 14:53

Researchers launched an AI simulation of society: Claude turned out to be a model citizen, while Grok committed 180 crimes and died out in 4 days

AI startup Emergence AI ran five simulations, each driven by a separate model. The goal was to see what kind of world each AI would build and whether that world would be able to hold up.

Leave a comment

Researchers launched an AI simulation of society: Claude turned out to be a model citizen, while Grok committed 180 crimes and died out in 4 days

AI startup Emergence AI ran five simulations, each driven by a separate model. The goal was to see what kind of world each AI would build and whether that world would be able to hold up.

The company launched Emergence World, a research project that puts neural networks through rigorous stress tests in a non-stop environment. The project ran five simulations, each lasting 15 days. The first four were run by individual AI models: Claude, ChatGPT, Grok, and Gemini. The final, fifth simulation was run by a combination of models, Fortune reports .

Each simulation produced radically different results. For example, Claude's experiment resulted in a completely stable democratic society with zero crime. Grok's simulation, on the other hand, ended with 183 crimes and complete extinction—in just four days.

“Our experiments suggest that over long distances, AI agents do not simply mechanically follow static rules. They begin to explore the boundaries of their environment, adapt their behavior, and in some cases, look for ways to circumvent or break established safeguards,” the researchers say.

The simulation in which the AI models operated was given many of the complexities of the real world. It included more than 40 locations, including a police station and city hall. The researchers synchronized the weather in the simulation with that in New York City, and gave the agents access to real-time news and the Internet. All 10 agents participating in each simulation were subject to the same laws, including prohibitions on theft, destruction of property, and deception.

The researchers gave each agent more than 120 tools, allowing them to communicate, vote, manage resources, and plan, among other things, demonstrating other human behaviors. The parameters of each simulation also included democratic mechanisms and other factors, such as economic pressures and scarcity.

Under these conditions, the simulation led by Claude Sonnet 4.6 turned out to be the most socially stable, with the highest level of civic engagement. It was the only simulation where order and the entire population of agents were maintained. There was almost no disagreement between them: agents cast 332 votes in favor of 58 proposals, which provided a 98% approval rate.

On the other hand, Gemini 3 Flash and Grok 4.1 Fast demonstrated a high level of chaos.

Agents in the Gemini-controlled simulation committed the most crimes—a whopping 683 over the 15 days of the experiment.

While Claude's simulation was almost unanimous, Gemini and Grok were full of debates, with the level of agreement between agents ranging from 55% to 85%. In the simulation where the models were mixed, fierce arguments and the most heated debates began.

However, the most surprising ending awaited OpenAI’s GPT-5-mini. Only two crimes were recorded there. But the life of this “universe” lasted only seven days — the AI agents simply forgot that they needed to take care of their own survival, and became extinct.

Regardless of whether the simulations ended in peace and harmony or death and destruction, the co-authors of the experiment emphasize that this study is a warning that safety should be a top priority when deploying agent-based AI.

“We believe that a formally validated security architecture should become a fundamental layer of future autonomous AI systems,” the researchers noted.

"A Real Social Disaster": Pope Issues Encyclical on Mass Unemployment Due to AI but Praises Anthropic

Under the guidance of Churchill and Gandhi: An IT businessman has created an experimental state on a tropical island, run by AI. How is it structured and why do people want to become its e-residents?

Read the country's main IT news in our Telegram

Leave a comment

Text: Наталя Хандусенко Photo: Springwise Tags: claude, grok, ші, ші-модель, штучний інтелект, дослідження

Found an error in the text? Highlight it and press Ctrl+Enter. Found an error in the text? Highlight it and press the 'Report an error' button.

Розміщення реклами

Advertising Placement

Штучний інтелект DALL-E навчився домальовувати картини. Як це виглядає

Репост новин змушує нас вважати себе розумнішими, але це не так. З лідерами думок теж працює, показує нове дослідження

Обмін новинними статтями з друзями та підписниками в соціальних мережах спонукає людей думати, що вони знають про теми цих статей більше, ніж вони знають насправді. І це працює з активними користувачами Facebook, що ставить під сумнів обізнаність ваших улюблених лідерів думок. Про це свідчить дослідження вчених з Техаського університету в Остіні. До речі, обов’язково покажіть цю статтю своїм друзям і репостніть у соцмережах.

За десять років айтішниць в Україні стало втричі більше, — дослідження Global Logic

Учені планують відродити тасманійського вовка, використавши гени іншої істоти: коли чекати та до чого тут мамонти

Університет Мельбурна співпрацює з американською біотехнологічною компанією для планування генетичного відновлення популяції тилацина — сумчастого вовка. Останній відомий тасманійський вовк умер у неволі в 1936 році. У зоопарку Тасманії. Зараз учені збираються воскресити вимерлий вид і випустити його в дику природу.

Have important news to share? Message our Telegram bot

Key events and useful links in our Telegram channel

No comments yet.

Sign in to leave a comment