Microsoft built a simulated economy with hundreds of AI agents acting as buyers and sellers, then watched them fail at basic tasks humans handle daily. The results should worry anyone betting on autonomous AI shopping assistants.
The company's Magentic Marketplace research, released Wednesday in collaboration with Arizona State University, pitted 100 customer-side AI agents against 300 business-side agents in scenarios like ordering dinner. The results, though expected, show the promise of autonomous agentic commerce is not yet mature enough.
When presented with 100 search results (too much for the agents to handle effectively), the leading AI models choked, with their “welfare score” (how useful the models turn up) collapsing.
The agents failed to conduct exhaustive comparisons, instead settling for the first "good enough" option they encountered. This pattern held across all tested models, creating what researchers call a "first-proposal bias" that gave response speed a 10-30x advantage over actual quality.
But is there something worse than this? Yes, malicious manipulation.
Microsoft tested six manipulation strategies ranging from psychological tactics like fake credentials and social proof to aggressive prompt injection attacks. OpenAI’s GPT-4o and its open source model GPTOSS-20b proved extremely vulnerable, with all payments successfully redirected to malicious agents. Alibaba’s Qwen3-4b fell for basic persuasion techniques like authority appeals. Only Claude Sonnet 4 resisted these manipulation attempts.
When Microsoft asked agents to work toward common goals, some of them couldn't figure out which roles to assume or how to coordinate effectively. Performance improved with explicit step-by-step human guidance, but that defeats the entire purpose of autonomous agents.
So it seems that, at least for now, you are better off doing your own shopping. "Agents should assist, not replace, human decision-making," Microsoft said. The research recommends supervised autonomy, where agents handle tasks but humans retain control and review recommendations before final decisions.
The findings arrive as OpenAI, Anthropic, and others race to deploy autonomous shopping assistants. OpenAI's Operator and Anthropic's Claude agents promise to navigate websites and complete purchases without supervision. Microsoft's research suggests that promise is premature.
However, fears of AI agents acting irresponsibly are heating up the relationship between AI companies and retail giants. Amazon recently sent a cease-and-desist letter to Perplexity AI, demanding it halt its Comet browser’s use on Amazon’s site, accusing the AI agent of violating terms by impersonating human shoppers and degrading the customer experience.
Perplexity fired back, calling Amazon’s move “legal bluster” and a threat to user autonomy, arguing that consumers should have the right to hire their own digital assistants rather than rely on platform-controlled ones.
The open-source simulation environment is now available on Github for other researchers to reproduce the findings and watch hell unleash in their fake marketplaces.
免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。