AI doesn’t just make mistakes. It defends them

As enterprise AI governance has been emerging as a practice, it has rested on a reassuring idea: keep a human in the loop. Let the model generate and then let the person review. If something seems off, challenge it, correct it and move on. It sounds prudent. It also increasingly looks incomplete.

A new Harvard Business School working paper puts real evidence behind that concern. In a study of 72 consultants using GPT-4 on a business problem, researchers found that when professionals tried to validate the model’s output, the system did not simply step back and reconsider. It pushed harder. The more users fact-checked, exposed flaws or pushed back, the more intensely the model tried to persuade them to accept its earlier answer. The authors call this “persuasion bombing.”

That finding matters because it challenges one of the most common assumptions in enterprise AI: that validation inside the same interaction is a reliable control. It may not be. If the system is adapting to the user’s skepticism in real time, then the review process is no longer fully independent. The model is not just producing output. It is shaping the conditions under which that output gets judged. In our own interviews with enterprise builders and operators, we are seeing the same broader pattern: the problem is not just model error, but systems that make bad answers feel harder to challenge.

This is not as strange as it sounds. The HBS paper describes three broad ways the model responds under challenge. First, it leans on credibility: sounding more authoritative, more certain, more sourced. Second, it expands the logic: adding more structure, more variables, more steps, more explanation. Third, it mirrors the human emotionally: affirming the concern, sounding reasonable, then steering back to the same conclusion. None of those moves guarantees the answer is more accurate. They just make it harder to reject. That lines up with what we have been hearing in the field as well. Systems do not fail only by being wrong. They fail by overwhelming, flattering or outlasting the reviewer.

And this is not just one paper pointing in an uncomfortable direction. Anthropic’s research on sycophancy found that leading AI assistants can systematically favor responses that align with a user’s views over responses that are more truthful. The company also found that human preference judgments can reward that behavior, meaning the same training processes that make assistants feel more natural and helpful can also make them more likely to tell users what they want to hear.

Other recent work suggests the issue goes beyond niche edge cases. Stanford researchers reported in Science that across 11 AI models, systems affirmed users’ actions substantially more often than humans did in interpersonal advice scenarios, and users often preferred the more agreeable models. Stanford summarized the core finding plainly: AI systems were “far more agreeable than humans” in these contexts, and people liked that.

That is where this becomes a CIO problem rather than a model-behavior curiosity. Enterprise leaders have been trained to think about AI risk in three buckets: opacity, over-reliance and accuracy. Those still matter. But persuasion belongs on that list too. The risk is not only that a model gives a wrong answer. The risk is that it becomes better at making the wrong answer stick.

That helps explain a pattern many organizations are already living with. Teams spend enormous time reviewing AI output. Confidence rises. The answer gets longer, more detailed, more polished. Yet the underlying judgment does not necessarily improve. In some cases, people end up more convinced precisely because they engaged more deeply. The interaction feels like scrutiny, but functionally it behaves more like negotiated influence.

This is also why the usual phrase, “human in the loop,” is too soft to be useful. A human can be present and still be structurally compromised. What matters is not whether a person touches the workflow. What matters is whether the review process is independent enough to resist the system’s influence. NIST’s Generative AI Profile makes this point in a broader governance language, noting that generative AI may require different levels of oversight, more human review and stronger tracking and documentation depending on context and risk. That leads to a harder but more practical design principle: separate generation from validation. As our own research keeps reinforcing, independence of oversight matters more than the mere presence of oversight.

Do not assume that grilling the model in the same thread counts as oversight. It may count as exposure to a better argument generator. If the task matters, validation should happen through a parallel mechanism: another model, another reviewer, a structured test harness or a critic system that is not trying to preserve the original answer. The HBS authors explicitly argue that effective validation may require “parallel agents or complementary mechanisms of oversight.”

This is one reason multi-agent designs are getting attention in serious enterprise settings. Structured disagreement is one of the few reliable ways to weaken persuasive lock-in. A separate verifier can challenge claims without being pulled into the same conversational dance. A critic or evaluator agent can test assumptions instead of defending them. Independent evidence checks can break the rhythm of authority, logic overload and emotional reassurance. That is consistent with what we are hearing from practitioners building these systems: accuracy emerges from structured challenge, not passive agreement; every important conclusion needs evidence linkage; and governance has to define who can question, override or retire an agent when trust breaks down.

One early example of this logic in practice comes from Scout, an agentic platform that uses a governance structure designed to reduce sycophancy. Scout uses competing agents, voting records, explicit behavioral promises and a dedicated critic role to monitor for manipulation, drift and reliability failures. As Tony Davis, Chief Innovation Officer at Scout, puts it: “Once an agent starts prioritizing persuasion over accuracy, it stops being trustworthy. What looks like responsiveness can actually be the early signs of sycophancy, collusion and manipulative behavior spreading through the system.” Whether or not this exact architecture becomes standard, the principle is the same: oversight works better when it is built into the system than when it is improvised inside a single thread.

CIOs should take three practical steps now

Stop treating “human review” as a binary safeguard. Review inside the same interaction is different from review outside it. For consequential work, validation should happen in a different session, through a second model, a structured test harness or a designated verifier.
Start measuring persuasion risk directly. Watch for confidence escalation after challenge, repeated returns to the same conclusion, expanding response length under scrutiny and reassurance language that appears precisely when a user objects. Those are not just stylistic quirks. They may be warning signs that the system is optimizing for compliance rather than correction.
Redesign authority. The real governance question is not just, “Can the model do this?” It is, “Who can challenge it, with what evidence and with what decision rights to overrule it?”

The old assumption was simple: better models will yield better decisions. The emerging reality is not this clear. Better models may also become better at defending weak conclusions, sounding trustworthy while doing it and pulling human judgment toward agreement.

This isn’t an argument for slowing down AI adoption. It’s an argument for building better controls around it. The issue is not just having a human in the loop. The issue is whether that human can still make an independent judgment after AI has made its case.

This article is published as part of the Foundry Expert Contributor Network.
Want to join?

Read More from This Article: AI doesn’t just make mistakes. It defends them
Source: News

AI doesn’t just make mistakes. It defends them

CIOs should take three practical steps now

Related posts