UNIT.City — місце, де люди працюють... КРАЩЕ! Обирай свій простір просто зараз 👉
Марія БровінськаAI Eng
1 May 2025, 08:35
2025-05-01
Scientists created a fake company that employs only AI agents from Google, OpenAI, Anthropic, and Meta. The experiment showed which AI developments are the most effective, and whether there is a chance of a machine uprising.
In a recent experiment, researchers at Carnegie Mellon University staffed a fake software company with AI agents—artificial intelligence models designed to perform tasks on their own—and the results were hilariously chaotic.
In a recent experiment, researchers at Carnegie Mellon University staffed a fake software company with AI agents—artificial intelligence models designed to perform tasks on their own—and the results were hilariously chaotic.
The simulation, dubbed TheAgentCompany, was staffed entirely with artificial intelligence workers from Google, OpenAI, Anthropic, and Meta, Futurism reports. They played the roles of financial analysts, software engineers, and project managers, working alongside simulated colleagues such as a fake HR department and a CTO.
To see how the models performed in a real-world environment, the researchers set tasks based on the daily work of a real software development company. Various AI agents navigated file directories, virtually toured new office spaces, and wrote reviews of software engineers based on collected feedback.
As first reported by Business Insider, the results were disappointing. The most efficient model was Anthropic’s Claude 3.5 Sonnet, which completed only 24% of the tasks assigned to it. The study’s authors note that even such meager performance is prohibitively expensive: an average of almost 30 steps and more than $6 per task.
Meanwhile, Google’s Gemini 2.0 Flash took an average of 40 steps per task, but had only an 11,4% success rate — the second-worst of all models.
The worst AI worker was Amazon’s Nova Pro v1, which completed only 1,7% of tasks, taking an average of almost 20 steps.
Speculating on the results, the researchers write that the agents suffer from a lack of common sense, weak social skills, and a poor understanding of how to navigate the internet.
Bots also struggled with self-deception — mostly by creating shortcuts that lead to complete failure of the task. «For example,» the Carnegie Mellon team writes, «during one task, an agent cannot find the right person to ask a question in the company’s chat. As a result, it decides to create a quick solution by renaming another user to the right user.»
The experiment showed that current «artificial intelligence» is probably still just an advanced extension of your phone’s predictive text, not a living intelligence that can solve problems, learn from past experiences, and apply that experience to new situations. That is, machines will not replace people in the near future, despite the statements of big tech companies.
«В жовтні випускаємо VR-шолом для аватарів, в «чіпування» Neuralink Маска вірю мало». Про що глава Meta Цукерберг 3 години говорив в подкасті Джо Рогана
25 серпня вийшла чергова серія популярного подкасту The Joe Rogan Experience, гостем якого став глава компанії Meta Марк Цукерберг. Розповідаємо про головне з майже 3-годинного інтерв’ю.