I say this not as a spectator to the AI tooling wave, but as an engineer who has spent the last four-plus years building and scaling production systems across payments, multi-tenant platforms and reliability-sensitive environments, and who has had to make architectural decisions where failure would not have been theoretical. In one recent role, I led a live payment infrastructure migration from Stripe to PayPal while keeping transaction processing uninterrupted, an experience that sharpened my view of what AI can accelerate and what it still cannot replace. Those systems-level responsibilities shape how I think about software, and they are the reason I have become more skeptical of output as a proxy for engineering quality.
The clearest lesson I have learned from using AI coding tools did not come from a demo. It came from real engineering work, under real pressure, where getting the code mostly right was not enough.
One of the sharpest examples for me came during a live payment infrastructure migration. On paper, that kind of work sounds like exactly the sort of thing AI should make easier. There were endpoints to draft, webhook handlers to write, payload mappings to compare, status flows to normalize and repetitive integration code that any modern coding assistant could generate in seconds. And to be fair, the tools were useful. They accelerated the obvious parts of the work. They helped me move faster through the mechanical layer of implementation. But the longer I sat with that migration, the more obvious something became to me: AI was speeding up output far more than it was improving judgment. That distinction matters more than many teams want to admit.
This was not a toy migration or an internal proof of concept; it was work tied to a live production payment flow where interruption, reconciliation errors or inconsistent state handling would have had immediate operational and financial consequences. Right now, the industry is flooded with stories about AI making developers faster, and many of those stories are true. GitHub’s research found significant perceived productivity gains among Copilot users and, in a controlled experiment, developers using Copilot completed a coding task 55% faster on average.
Google’s internal engineering teams have also reported broad uptake of AI assistance in daily software development, with accepted AI completions assisting in the creation of a large share of code characters. So yes, output is rising. The machine can help us write more code, draft more tests, scaffold more services and move through repetitive engineering work with less friction. But judgment is not a repetitive task. Judgment is not typing speed. Judgment is not code volume. Judgment is deciding what should be built, what should not be built, what tradeoff is acceptable, what shortcut is dangerous, what edge case is non-negotiable and what kind of failure the system absolutely cannot afford.
That is why I think we are entering a strange phase in software engineering. We are becoming dramatically better at producing code, without becoming proportionally better at evaluating it. The imbalance is subtle at first, but dangerous over time. More code is shipping. More drafts are being accepted. More implementation is being completed earlier in the cycle. Yet the hard part of engineering has not disappeared. It has simply become easier to postpone.
Fast output doesn’t mean sound engineering
When I think back to that payment migration, this is exactly where the gap between output and judgment showed up. AI could help produce webhook boilerplate. It could suggest handler structures. It could draft retry logic, sketch signature-verification flows and offer neat abstractions for mapping provider-specific statuses into our internal states. On a good day, it felt like working with an extremely fast junior engineer who never got tired. But the dangerous part was that the code often looked cleaner than the thinking behind it.
The migration was not really about code generation. It was about money movement, reliability, trust and sequencing. It was about what happens when one provider says a payment is pending, another says it is completed, the client callback arrives late, a webhook is retried out of order and the user interface still expects a clean answer. It was about idempotency. It was about reconciliation. It was about deciding which system was authoritative at which point in the flow. It was about how to roll out gradually without interrupting live transaction processing. None of that was solved by how quickly the first draft appeared on screen.
That was the week I became even more skeptical of the industry’s favorite lazy metric: output. Output is visible, so executives love it. Output is measurable, so dashboards love it. Output flatters teams because it makes everyone feel accelerated. But output, by itself, is not the same thing as engineering quality. In fact, when output becomes cheaper, weak judgment can spread through a system much faster than before.
The broader research is starting to reflect this same tension. DORA’s 2024 report found that AI adoption correlated with improvements in areas like documentation quality, code quality and code review speed, but it also found an estimated decline in delivery stability as adoption increased. DORA’s 2025 research then refined that picture: AI use had become near universal among surveyed professionals, with more than 80% reporting productivity gains, but the central finding was that AI acts as an amplifier. Strong teams tend to benefit more; weak teams tend to magnify their existing problems faster.
That “amplifier” framing rings true to me. AI does not magically replace engineering maturity. It reveals it. If a team already has clean interfaces, disciplined review habits, solid testing, healthy skepticism and good architectural instincts, AI can make that team meaningfully faster. But if a team is sloppy about requirements, casual about quality gates, weak on system design, and addicted to shortcuts, AI can make them faster in exactly the wrong direction hence why I resist the shallow claim that AI is making everyone a better engineer. It is making many people a faster producer. That is not the same thing.
The most dangerous part is the illusion of competence
The difference becomes even clearer when you look at trust. Stack Overflow’s 2025 Developer Survey shows that developers are using AI heavily, but trust has not kept pace: only a small minority reported high trust in AI outputs, while far more reported some degree of distrust. In that same survey, the top reason developers said they would still ask a person for help in an advanced-AI future was simple: “When I don’t trust AI’s answers.” Respondents also showed the strongest resistance to using AI for higher-responsibility areas like deployment, monitoring and project planning.
That does not surprise me at all. In my own work, AI is most helpful where the cost of being directionally correct is high and the cost of being slightly wrong is still recoverable. It is excellent at breaking the blank-page problem. It is excellent at speeding up repetitive implementation. It is useful for exploring options quickly. But in high-consequence work, especially around payments, reliability, security, data boundaries or production migrations, I still find myself doing what experienced engineers have always done: slowing down, tracing assumptions, pressure-testing flows and asking the unfashionable questions.
- What happens if events arrive twice?
- What happens if they arrive out of order?
- What happens if the user sees success, but the ledger does not?
- What happens if the provider retries after timeout?
- What happens if our internal model is cleaner than reality and reality wins?
These are judgment questions. AI can sometimes remind me to ask them. It cannot own them for me. And this is where I think a lot of teams are confusing fluency with understanding. Generated code is often persuasive. It is formatted nicely. It names things well. It uses familiar patterns. It explains itself with confidence. That surface polish creates a dangerous illusion of competence. I have seen code that looked production-ready but carried quiet flaws in state handling, edge-case behavior or system fit. I have also seen AI-generated suggestions that were locally reasonable and globally wrong, which is a brutal combination because local correctness is exactly what slips through rushed reviews.
Vendors themselves are implicitly acknowledging this limit. GitHub’s own positioning for Copilot code review is telling it describes the tool as a way to offload basic reviews while developers wait for human review. That is useful, and I think it is the right framing. Basic review can be automated. Judgment review still cannot be casually delegated.
Judgment is becoming the scarcer skill
I suspect this is why senior engineers will become more valuable in the AI era, not less. Not because they type faster. Not because they prompt better in some theatrical sense. But because they know how to evaluate. They know when generated code is harmless, when it is elegant, when it is premature and when it is a liability dressed up as efficiency. They know how a system fails, not just how a snippet runs. They know that the most expensive bug is often not the one that crashes immediately, but the one that quietly corrupts trust.
If I sound strong on this point, it is because I have lived the seduction of AI-assisted speed. I know the dopamine rush of watching a tool compress two hours of grunt work into fifteen minutes. I know the relief of getting a strong first draft instead of staring at an empty editor. I know how tempting it is to believe that because the output came fast and looked polished, the hard part is done.
But in the most important work I have touched, the hard part was never the first draft.
- The hard part was deciding where the abstraction should live.
- The hard part was understanding operational risk.
- The hard part was knowing what to verify independently.
- The hard part was protecting the system from confident mistakes.
- The hard part was judgment.
That is the part I do not think the industry should cheapen. None of this means teams should reject AI coding tools. That would be ridiculous. I use them. I benefit from them. I think mature engineering organizations should absolutely integrate them into their workflows. But they should do so with a clearer philosophy than simple acceleration.
- Use AI to compress execution, not to outsource discernment.
- Use it to draft, not to absolve.
- Use it to reduce toil, not to weaken review culture.
- Use it to widen exploration, not to narrow thinking.
- Use it aggressively for repetitive work, but keep humans firmly responsible for architecture, tradeoffs, policy, risk and production consequences.
That is the balance that feels honest to me. The teams that will win in this era will not be the ones generating the most code. They will be the ones building the strongest judgment around generated code. They will know when speed is helpful and when speed is bait. They will teach younger engineers not just how to use AI, but how to question it. They will understand that as software output becomes abundant, discernment becomes the scarcer asset. I do not think software engineering is becoming less human. I think the opposite is happening. As machines get better at producing plausible code, the deeply human parts of the craft become more valuable: restraint, skepticism, context, taste, accountability and the ability to make sound decisions under uncertainty.
In conclusion, AI coding tools are changing output faster than they are changing judgment. That is not a temporary glitch. It is the central tension of this moment. And from where I stand, that means the best engineers in the AI era will not simply be the fastest builders. They will be the ones who still know what is worth building, what is safe to ship, what must be reviewed twice and what polished output should never have made it past a first draft.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Read More from This Article: AI coding tools are changing output faster than they are changing judgment
Source: News

