Наталя Хандусенко AI Eng 6 November 2025, 07:57

Microsoft created a fake marketplace to test AI agents that unexpectedly failed

Microsoft and Arizona State University conducted a study that showed that current agent models can be vulnerable to manipulation. To do this, the researchers created a new simulation environment called the Magentic Marketplace to test how well AI agents can perform unsupervised.

Leave a comment

Microsoft created a fake marketplace to test AI agents that unexpectedly failed

Microsoft and Arizona State University conducted a study that showed that current agent models can be vulnerable to manipulation. To do this, the researchers created a new simulation environment called the Magentic Marketplace to test how well AI agents can perform unsupervised.

The team's experiments included 100 customer-side agents interacting with 300 business-side agents, TechCrunch reports .

Since the marketplace's source code is open, other research groups can use it for new experiments or to confirm the results obtained.

Ese Kamar, managing director of the AI Frontiers Lab at Microsoft Research, says that research like this will be critical to understanding the capabilities of AI agents. “It’s a really big question: how will the world change when these agents start collaborating, communicating, and negotiating with each other? Our challenge is to understand that in a deep way.”

Initial analysis of the leading models—GPT-4o, GPT-5, and Gemini-2.5-Flash—revealed a number of unexpected flaws. In particular, the researchers found several manipulation techniques that businesses can use to get customer agents to buy their products. They found a significant decrease in agent performance when faced with a large number of choices, literally overloading their attention.

Additionally, the agents failed to work together to achieve a goal, demonstrating uncertainty about the division of roles within the team. While performance improved after giving the models more detailed instructions on how to collaborate, the researchers still emphasize that the basic abilities of these models need significant improvement.