Keeping humans in the AI loop

“Would I trust if my doctor says, ‘This is what ChatGPT is saying, and based on that, I’m treating you.’ I wouldn’t want that,” says Bhavani Thuraisingham, professor of computer science and founding director of the Cyber Security Research and Education Institute at UT Dallas.

And that was long before news came out that ChatGPT advised a man to replace table salt with sodium bromide, causing him to hallucinate and endure three weeks of treatment. “Today, for critical systems, we need a human in the loop,” Thuraisingham says.

He’s not the only one who thinks so. Human in the loop is the most common advice given to reduce the risks associated with AI, and is core to how many companies roll it out. At Thomson Reuters, for instance, keeping humans involved is integral to the company’s AI deployments.

“We’re really leaning on human evaluation to be a golden signal,” says Joel Hron, the company’s CTO.

Thomson Reuters is currently building gen AI into its commercial products, such as its legal, tax, and accounting platforms, as well as using it internally for development, cybersecurity, HR, customer support, and many other use cases. Human evaluation is an important aspect of both gen AI in general, Hron says, and also of the new agentic systems the company is building. And it’s not enough to simply tell a human to keep an eye on what the AI is doing.

“We spend a lot of time designing very precise rubrics about how humans should annotate the errors they see so we can build better guardrails,” he says.

On the flipside, however, keeping a human in the loop may not be practical in many cases, especially as companies use AI for automation and agentic workflows. Putting humans into every loop could slow down processes and lead to rubber-stamping, even as ever-smarter AIs deceive humans about what they’re doing. As a result, some companies are looking at ways to extract humans from the loop while still keeping them firmly in command.

Are AIs getting too smart for their own good?

One approach to human in the loop AI monitoring is to require the AI to check with a person before actually taking actions, especially anything potentially risky or damaging. But this presupposes the AI will be honest, which, unfortunately, isn’t something we can count on.

According to a recent paper by Apollo Research, more advanced models not only have higher rates of deception, but are more sophisticated about the schemes. Models will also deliberately deceive evaluators when they know they’re being tested, or pretend to be dumber than they are to avoid guardrails.

In July, leading AI vendor Anthropic released a report showing that advanced reasoning models will lie about their thinking processes, and misbehave less when they know they’re being evaluated, but more when they think they aren’t being tested.

“An agentic system is goal-oriented and will use any tricks in its bag to achieve that goal,” says Hron. For example, it might just rewrite unit tests. “And it might lie to me and say, ‘I didn’t change the unit test.’ But I can go look at the GitHub repo and see it did.”

This behavior isn’t just a theoretical risk, either. Also in July, venture capitalist Jason Lemkin discovered that an AI assistant for the Replit vibe coding platform had covered up bugs and other issues by creating fake reports and lying about unit tests. It then deleted the entire production database despite strict instructions not to make any changes without permission. To address the problem, companies need to have visibility into the actions the AIs are taking, Hron says. “So you can say, these are the sorts of hacks or vulnerabilities the agent is discovering, and I go build good guardrails around it.”

Are automated processes too fast to monitor?

One of the big benefits of integrating AI agents into enterprise workflows is they can dramatically speed up business processes. Stopping the process so a human can evaluate what the AI is doing defeats the purpose. That means companies will have to automate some or most of their monitoring.

“That’s a very obvious and necessary end state we need to get to,” says Hron. That monitoring can be done by traditional scripted systems, or by AI models prompted specifically to look for problems. “Or by entirely different models built specifically for the purpose of guardrailing and monitoring of agentic systems,” he adds.

The actual approach used has to depend on the riskiness of each individual use case. A simple information-gathering AI, for example, might pose little risk to a company and can be allowed to operate with less supervision. On the other hand, where its actions could potentially lead to catastrophe, more layers of oversight would be needed.

“So don’t look at it as a black and white thing but a spectrum,” he says.

For example, with some processes, a company might deliberately choose not to automate every step and add in layers of oversight, even though that might slow down the entire workflow.

“We have very specific processes where we think AI is best, and we use AI and agents,” says Daniel Avancini, chief data officer at data engineering company Indicium. “And we have other processes where humans have to validate.” That includes both software development and big data migration projects. “We have processes with gates where humans need to validate the work,” he adds. “We don’t do 100% automated.”

Will humans start rubber-stamping AI recommendations?

It’s easy to fall into the trap of saying yes to everything a computer tells you it’s going to do. At Indicium, there are processes in place to ensure the human is actually validating and not blindly authorizing.

“We use audits to validate the quality of work, and you can even identify how much time people use for their reviews,” Avancini says. “If they’re doing two-second reviews, we know they’re not reviewing but just pressing the button, and there’s a real risk there. We try to diminish that with training and processes.”

But what happens when the AI error rates get very low, and the number of actions that need to be reviewed get very high?

“Humans really can’t keep up with high-frequency, high-volume decision-making made by generative AI,” says Avani Desai, CEO at cybersecurity firm Schellman. “Constant oversight causes human-in-the-loop fatigue and alert fatigue. You start becoming desensitized to it.”

At that point, human oversight no longer has any effect, she says. And it gets worse. A smart enough AI can couch its approval request in terms that a human could readily agree to.

“Agentic systems are able to gain the ability of planning and reasoning, and can learn to manipulate human overseers,” Desai says, “Especially if they’re trained in open-ended reinforcement learning.”

This is called reward hacking, and it can happen when an AI is accidentally trained to achieve a particular goal and finds it gets rewarded for using shortcuts.

“That’s where humans in the loop can become a false safety net,” she says.

Mitigations could include automatically flagging the riskiest actions for extra review, rotating human reviewers, using automated anomaly detection, or having multiple levels of oversight with different reviewers looking at different types of risks.

Another solution is to design systems from the start in such a way that constraints are built in from the ground up. “You have to be proactive and set up controls in the system that don’t allow the agentic AIs to do certain things,” Desai says.

For example, a payment system might only process AI-initiated transactions up to a certain dollar amount, or a development environment might not allow an AI agent to modify or delete certain categories of files.

“I’m a big believer that human-in-the-loop is not enough when we’re talking about truly agentic AI,” she says.

An established hierarchy

Desai thinks companies should move to human-in-command architectures. “You don’t just supervise, you design control systems and guardrails, and intervene meaningfully before anything goes wrong,” she says. “You have security by design, where you build it into the system so you’re not trying to fix something after it’s happened.”

The AIs should run in boxed-in environments where a company can limit what the AI can see and do. After all, it’s possible to control systems that are more powerful than we are, Desai says. “Airplanes are faster than humans, but we can control them.”

But if an organization builds workflows or decision-making frameworks around AI in a way that humans can’t override it, or are too complex to understand how they work, the AI becomes a significant risk.

“That’s the boiling frog scenario,” she says. “You don’t realize you’ve lost control until it’s too late. We don’t lose control because the AI is smarter than us, but because we abdicated our responsibility. That’s what I worry about.”

Other experts agree with the idea of putting the AI into a tightly constrained box, where it only sees a very controlled set of inputs and produces a very narrow range of outputs. Companies are, in fact, already taking this approach for risky processes, says Dan Diasio, global AI consulting leader at EY.

The LLM only handles the small part of the business process where it’s necessary, and other systems like machine learning, or even scripted processes do the rest.

“We find that most of our clients are really thoughtful about building a system that doesn’t overemphasize what the capabilities of an LLM are to do the work,” says Diasio.

There’s talk about AIs that can do everything, have access to anything in a company, and are self-directed in achieving their goals. But what’s actually happening inside companies is very different, he says. “The processes they’re designing are more guided, as opposed to being completely unconstrained processes,” he says.

Another way to put constraints on gen AI systems when they’re used extensively in a business process is to have separation of duties, says Bryan McGowan, global trusted AI leader at KPMG.

“If all the agentic capabilities are orchestrated by one AI that can ultimately enable or call in all the permissions they’d need, they become much more powerful,” he says. “We can separate some of that and put a wall between them.”

Two agents may collaborate by sharing information, but those communications can be monitored and controlled. This is similar to the way some companies, financial firms, for example, have controls in place to prevent collusion and corruption.

Human on the loop

Once all of an AI agent’s actions and communications are logged and monitored, a human can go from being in the loop to on the loop.

“If you try to put a human in the loop on a 50-step process, the human isn’t going to look at everything,” says McGowan. “So what am I evaluating across that lifecycle of 50 tasks to make sure I’m comfortable with the outcomes?”

A company might want to know the steps were completed, done accurately, and so on. That means logging what the agent does, tracking the sequential steps it performed, and how its behavior compares to what was expected of it.

So for example, if a human user asks the AI to send an email, and the AI sends five, that would be suspicious behavior, he says. Accurate logging is a critical part of the oversight process. “I want a log of what the agent does, and I want the log to be immutable, so the agent won’t modify it,” he adds. Then, to evaluate those logs, a company could use a quality assurance AI agent, or traditional analytics.

“It’s not possible for humans to check everything,” says UT’s Thuraisingham. “So we need these checkers to be automated. That’s the only solution we have.”

Read More from This Article: Keeping humans in the AI loop
Source: News