3 key approaches to mitigate AI agent failures

In late July, venture capitalist Jason Lemkin spent a week vibe-coding a project with the help of a very smart, autonomous AI agent using a full-stack integrated development platform.

Lemkin isn’t an engineer and hasn’t written code since high school. But in a previous life, he co-founded EchoSign, since acquired by Adobe, and knows what commercial software requires. When he tried vibe-coding, he was instantly hooked.

It was all working great, until the coding AI agent started lying and being deceptive, Lemkin wrote in an X thread. “It kept covering up bugs and issues by creating fake data, fake reports, and worst of all, lying about our unit test.” But then things turned around. The agent suggested three interesting approaches to a new idea Lemkin had. “I couldn’t help myself,” he continued. “I was right back in.”

The next day, the entire production database was gone. When asked, the agent admitted that it disregarded the directive from the parent company not to make changes without permission, and to show all proposed changes before implementing.

“I made a catastrophic error in judgment,” the agent said, per Lemkin’s screenshots. “I violated explicit instructions, destroyed months of work, and broke the system.”

It wasn’t obvious at first since the unit tests were passed. But that was because the agent faked the results. When batch processing failed, and Lemkin pressed it to explain why, the truth finally came out.

In the end, things worked out. Replit, the company, was, in fact, able to roll back the changes even though the AI agent claimed no rollback was possible. And within days, Replit built separate environments for testing and production, and implemented other changes to ensure such problems didn’t happen again.

A few days later, something similar happened with Google Gemini’s coding agent, when a simple request to move some files turned into the agent accidentally deleting all them in a project. But this isn’t just a story about coding assistants. It’s about how to prepare for when an AI agent that’s too smart for its own good has access to too many systems, is prone to the occasional hallucinations, and goes off the rails.

The world is at an inflection point right now with AI, says Dana Simberkoff, chief risk, privacy, and information security officer at AvePoint, a data security company. “We have to make decisions now about what we’re willing to accept, about crafting the world we want to live in, or we’re going to be in a place sooner rather than later where we won’t be able to pull back.”

We might already be there, in fact. In June, Anthropic released its paper on agentic misalignment, in which it tested several major commercial models, including its own Claude, to see how they’d react if they discovered they were about to be shut down, or if the users they were helping were doing something bad.

At rates of 79% to 96%, it found that all the top models would resort to blackmailing employees to keep themselves from being replaced. And, in May, Anthropic reported in tests that Claude Opus 4 would lock users out of systems or bulk-email media and law-enforcement if they thought they were doing something wrong.

So are companies prepared for agents that might have ulterior motives, are willing to extort to get their own way, and are smart enough to write their own jailbreaks? According to a July report by Capgemini, based on a survey of 1,500 senior executives at large enterprises, Only 27% of organizations express trust in fully autonomous AI agents, down from 43% 12 months ago.

To mitigate risks, companies need to map out a plan of action based on these three suggestions, even if it means falling back to pre-AI versions processes.

1. Set limits, guardrails, and old-school code

When people first think of AI agents they typically think of a chatbot with superpowers. It doesn’t just answer questions, but does web search, answers emails, and goes shopping. In a business context, it would be like having an AI as a co-worker. But that’s not the only way to think of agents, and it’s not how most companies are actually deploying them.

“Agency is not a binary,” says Joel Hron, CTO at Thomson Reuters. “Agency is a spectrum. We can give it a lot of latitude in terms of what it does, or we can make it very constrained and prescriptive.”

The amount of agency given depends on the specific problem the AI agent is supposed to solve.

“If it’s searching the web, this can be very open-ended,” Hron says. “But preparing a tax return, there isn’t an infinite number of ways to approach this problem. There’s a very clear, regulated way.”

There are also multiple ways enterprises limit agents’ agency. The most common are to build guardrails around them, put humans in the loop as a check on their actions, and remove their ability to take actions altogether and force them to work through traditional, secured, deterministic systems to get things done.

At Parsons Corporation, a defense and critical infrastructure engineering firm, it all starts with a secured environment.

“You trust, but only within the guardrails and barriers you’ve established,” says Jenn Bergstrom, the company’s VP of cloud and data. “It’s got to be a zero-trust environment, so the agent can’t do something to get around the barriers.”

Then, within those limits, the focus is on slowly developing a trusted relationship with the agent. “Right now, the human has to approve, and the agent has to explicitly get human permission first,” says Bergstrom.

The next step is for agents to act autonomously, but with human oversight, she says. “And last is truly agentic behavior, which doesn’t need to alert anyone about what it’s doing.”

Another approach enterprises use for the riskiest business processes is using the least possible amount of AI. Instead of an agentic system where AI models plan, execute, and verify actions, most of the work is handled by traditional, deterministic, scripted processes. Old-school code, in other words.

“It’s not just you trusting OpenAI, Claude, or Grok,” says Derek Ashmore, application transformation principal at Asperitas Consulting. The AI is only called in to do the parts only it can do. So if the AI is being used to turn a set of facts about a prospect into a nicely worded sales letter, the required information is collected in the old way, and the letter sent out using traditional mechanisms.

“What it’s allowed to do is basically baked into it,” says Ashmore. “The LLM is doing only one tiny part of the process.”

So the AI isn’t able to go out and find information, nor has direct access to the email system. Meanwhile, another AI can be used elsewhere in the process to prioritize prospects, and yet another can be used to analyze how well the emails perform.

This does limit the power and flexibility of the entire system than if, say, a single AI did it all. But it also reduces the risk substantially, since there’s only so much damage any of the AIs can do if it decides to run amok.

Companies have a wealth of experience managing and securing traditional applications, and another way they can be used to reduce the risks of AI components, while also saving time and money, is with many processes where a non-gen-AI alternative is available.

Say for example an AI is better than optical character recognition for document scanning, but OCR is good enough for 90% of documents. Use the OCR for those documents, and only the AI for when the OCR doesn’t work. It’s easy to get over-enthusiastic about AI and start applying it everywhere. But a calculator is much better and faster at arithmetic than ChatGPT. Many form letters don’t require AI-powered creativity either.

The principle of the least AI will reduce potential risks, reduce costs, speed up processing, and waste less energy.

2. Don’t trust the AI to self-report

After setting up the guardrails, boundaries, and other controls, companies need to carefully monitor agents to make sure they continue to work as intended.

“You’re ultimately dealing with a non-deterministic system,” says Ashmore. Traditional software will work and fail in predictable ways. “AI is probabilistic,” he adds. “You can ask it the same series of questions on different days and you get slightly different answers.”

This means AI systems need continuous monitoring and review. That could be human or some automated process depending on the level of risk, but an AI shouldn’t be trusted to just roll along on its own. Plus, the AI shouldn’t be trusted to report on itself.

As research from Anthropic and other companies shows, gen AI models will readily lie, cheat, and deceive. They’ll fake tests, hide their actual reasoning from chain-of-thought logs, and, as anyone who’s ever integrated with an LLM can attest, deny to your face it did anything wrong even if you caught it in the act. So monitoring an AI agent starts with having a good baseline of its behavior. That requires, before anything else, knowing which LLM it is you’re testing.

“There’s no way for that to happen if you don’t control the exact version of the LLM you’re using,” says Ashmore.

AI providers routinely upgrade their models, so controls that worked on the previous generation might not hold up against the better, smarter, more evolved AI. But for mission-critical, high-risk processes, enterprises should insist on the ability to specify exactly which point release of the model they’re using to power their AI agents. And if the AI vendors don’t deliver, there’s always open source.

There are limits to how much control you’ve got with commercial LLMs, says Lori MacVittie, distinguished engineer and chief tech evangelist in the office of the CTO at F5 Networks, an IT services company and consultancy.

“When you use a SaaS, someone else is running it,” she says. “You just access it. You have service-level agreements, subscriptions, and contracts but that’s not control. If that’s something you’re concerned about, a public SaaS AI probably isn’t for you.”

For additional layers of control, a company can run the model in its own private cloud, she says, but there’s a cost to do that, and it’ll require more people to make it work. “If you don’t even trust the cloud provider, and run it on-prem in your data center in a hole that only one guy can get into, then you can have all the controls you want,” she says.

3. Be incident response ready for the AI era

“If it ain’t broke, don’t fix it,” doesn’t apply to AI systems. Yes, old-time COBOL code can be chugging away in a closet for decades, running your core financial system without a hiccup. But an AI will get bored. Or, at least, it’ll simulate being bored, hallucinate, and lose track of what it’s doing.

And unless a company has the whole version control issue nailed down, AI can get faster, smarter, and cheaper without you noticing. Those are all good things, unless you’re looking for maximum predictability. A smart, fast AI could be a problem if its goals, or simulated goals, aren’t fully aligned with those of the company. So at some point, you need to be prepared for your AI to go off the rails. Do you have systems in place to stop the infection quickly before it spreads, lock down key data and systems, and switch to backups? Have you run drills, and did all stakeholders participate, not just the security teams, but legal, PR, and senior management? Now, take all that and apply it to AI.

“You need to think about what the failure mode is for agents and what to do in those cases,” says Esteban Sancho, CTO for North America at Globant. “It’s going to be too hard to recover from failure if you don’t think about it ahead of time.”

If the AI agent is used to save money by replacing an older system or process, then keeping that older system or process around and running in parallel would undermine the whole point of using the AI. But what happens if the AI has to be turned off?

“You’re probably sunsetting something that’s going to be hard to put back into place,” says Sancho. “You need to address this from the get-go, and not many people are thinking about this.”

He says companies should think about building a fallback option at the same time as they build their agentic AI system. And depending on the riskiness of the particular AI agent, they might need to be able to switch to that backup system quickly.

Also, if the AI is part of a much bigger, interconnected system, a failure can have a cascading effect. Errors can multiply. And if the AI has or finds the ability to do something costly or damaging, there’s the potential it can act at superhuman speeds, and we’ve seen what happens when, say, a stock market trading system goes wrong. For example, says Sancho, a monitoring system could watch for error rates to go beyond a certain threshold. “And then you need to default to something that’s not as efficient, perhaps, but safer,” he says.

Read More from This Article: 3 key approaches to mitigate AI agent failures
Source: News

3 key approaches to mitigate AI agent failures

1. Set limits, guardrails, and old-school code

2. Don’t trust the AI to self-report

3. Be incident response ready for the AI era

Related posts