Microsoft lets shopping bots loose in a sandbox

Do you think it’s time to turn an AI agent loose to do your procurement for you? As that could be a potentially expensive experiment to conduct in the real world, Microsoft is attempting to determine whether agent-to-agent ecommerce will really work, without the risk of using it in a live environment.

Earlier this week, a team of its researchers launched the Magentic Marketplace, an initiative they described as an “an open source simulation environment for exploring the numerous possibilities of agentic markets and their societal implications at scale.” It manages capabilities such as maintaining catalogs of available goods and services, implementing discovery algorithms, facilitating agent-to-agent communication, and handling simulated payments through a centralized transaction layer.

The 23-person research team wrote in a blog detailing the project that it provides “a foundation for studying these markets and guiding them toward outcomes that benefit everyone, which matters because most AI agent research focuses on isolated scenarios — a single agent completing a task or two agents negotiating a simple transaction.”

But real markets, they said, involve a large number of agents simultaneously searching, communicating, and transacting, creating complex dynamics that can’t be understood by studying agents in isolation.

Capturing this complexity is essential, because real-world deployments raise critical questions about consumer welfare, market efficiency, fairness, manipulation resistance, and bias — questions that can’t be safely answered in production environments.

They noted that even state-of-the-art models can show “notable vulnerabilities and biases in marketplace environments,” and that, in the simulations, agents “struggled with too many options, were susceptible to manipulation tactics, and showed systemic biases that created unfair advantages.”

Furthermore, they concluded that a simulation environment is crucial in helping organizations understand the interplay between market components and agents before deploying them at scale.

In their full technical paper, the researchers also detailed significant behavioral variations across agent models, which, they said, included “differential abilities to process noisy search results and varying susceptibility to manipulation tactics, with performance gaps widening as market complexity increases,” adding, “these findings underscore the importance of systematic evaluation in multi-agent economic settings. Proprietary versus open source models work differently.”

Bias and misinformation an issue

Describing Magentic Marketplace as “very interesting research,” Lian Jye Su, chief analyst at Omdia, said that despite recent advancements, foundation models still have many weaknesses, including bias and misinformation.

Thus, he said, “any e-commerce operators that wish to rely on AI agents for tasks such as procurement and recommendations need to ensure the outputs are free of these weaknesses. At the moment, there are a few approaches to achieve this goal. Guardrails and filters will enable AI agents to generate outputs that are targeted and balanced, in line with rules and requirements.”

Many enterprises, said Su, “also apply context engineering to ground AI agents by creating a dynamic system that supplies the right context, such as relevant data, tools, and memory. With these tools in place, an AI agent can be trained to behave more similarly to a human employee and align the organizational interests.”

Similarly, he said, “we can therefore apply the same philosophy to the adoption of AI agents in the enterprise sector in general. AI agents should never be allowed to behave fully autonomously without sufficient check and balance, and in critical cases, human-in-the-loop.”

Thomas Randall, research lead at Info-Tech Research Group, noted, “The key finding was that when agents have clear, structured information (like accurate product data or transparent listings), they make much better decisions.” But the findings, he said, also revealed that these agents can be easily manipulated (for example, by misleading product descriptions or hidden prompts) and that giving agents too many choices can actually make their performance worse.

That means, he said, “the quality of information and the design of the marketplace strongly affect how well these automated systems behave. Ultimately, it’s unclear what massive value-add organizations may get if they let autonomous agents take over buying and selling.”

Agentic buying ‘a broad process’

Jason Anderson, vice president and principal analyst at Moor Insights & Strategy, said the areas the researchers looked into “are well scoped, as there are many different ways to buy and sell things. But, instead of attempting to execute commerce scenarios, the team kept it pretty straightforward to more deeply understand and test agent behavior versus what humans tend to assume naturally.”

For example, he said, “[humans] tend to narrow our selection criteria quickly to two or three options, since it’s tough for people to compare a broad matrix of requirements across many potential solutions, and it turns out that model performance also goes down when there are more choices as well. So, in that way there is some similarity between humans and agents.”

Also, Anderson said, “by testing bias and manipulation, we can see other patterns such as how some models have a bias toward picking the first option that met the user’s needs rather than examining all the options and choosing the best one. These types of observations will invariably end up helping models and agents improve over time.”

He also applauded the fact that Microsoft is open sourcing the data and simulation environment. “There are so many differences in how products and solutions are selected, negotiated, and bought from B2B versus B2C, Premium versus Commodities, cultural differences and the like,” he said. “An open sourcing of this tool will be valuable in terms of how behavior can be tested and shared, all of which will lead to a future where we can trust AI to transact.”

One thing this blog made clear, he noted, “is that agentic buying should be seen as a broad process and not just about executing the transaction; there is discovery, selection, comparison, negotiation, and so forth, and we are already seeing AI and agents being used in the process.”

However, he observed, “I think we have seen more effort from agents on the sell side of the process. For instance, Amazon can help someone discover products with its AI. Salesforce discussed how its Agentforce Sales now enables agents to help customers learn more about an offering. If [they] click on a promotion and begin to ask questions, the agent can them help them through a decision-making process.”

Caution urged

On the buy side, he said, “we are not at the agent stage quite yet, but I am very sure that AI and chatbots are playing a role in commerce already. For instance, I am sure that procurement teams out there are already using chat tools to help winnow down vendors before issuing RFIs or RFPs. And probably using that same tool to write the RFP. On the consumer side, it is very much the same, as comparison shopping is a use case highlighted by agentic browsers like Comet.”

Anderson said that he would also “urge some degree of caution for large procurement organizations to retool just yet. The learnings so far suggest that we still have a lot to learn before we see a reduction of humans in the loop, and if agents were to be used, they would need to be very tightly scoped and a good set of rules between buyer and seller be negotiated, since checking ‘my agent went rogue’ is not on the pick list for returning your order (yet).”

Randall added that for e-commerce operators leaning into this, it is “imperative to present data in consistent, machine-readable formats and be transparent about prices, shipping, and returns. It also means protecting systems from malicious inputs, like text that could trick an AI buyer into making bad decisions —the liabilities in this area are not well-defined, leading to legal headaches and complexities if organizations question what their agent bought.”

Businesses, he said, should expect a future where some customers are bots, and plan policies and protections, accordingly, including authentication for legitimate agents and rules to limit abuse.

In addition, said Randall, “many companies do not have the governance in place to move forward with agentic AI. Allowing AI to act autonomously raises new governance challenges: how to ensure accountability, compliance, and safety when decisions are made by machines rather than people — especially if those decisions cannot be effectively tracked.”

For those who’d like to explore further, Microsoft has made Magentic Marketplace available as an open source environment for exploring agentic market dynamics, with code, datasets, and experiment templates available on GitHub and Azure AI Foundry Labs.

This article originally appeared on Computerworld.

Read More from This Article: Microsoft lets shopping bots loose in a sandbox
Source: News

Microsoft lets shopping bots loose in a sandbox

Bias and misinformation an issue

Agentic buying ‘a broad process’

Caution urged

Sharing the sandbox

Related posts