The next time someone on your team says, ‘hire an AI engineer,’ stop the conversation.
That title is too vague because it fails to account for critical differences in engineering strengths. Instead, companies need to decide specifically what they need. Is it someone to rapidly prototype AI solutions? Or someone to build the solution that makes it ready for production? Or someone to design the supporting capabilities and infrastructure to scale it? These are all different skills and require different assessments during the hiring process.
But here’s where companies also fall short. Assessing skills is hard and assessments, as we know them, are broken when it comes to AI. They’re misaligned with what AI roles actually demand. That misalignment is what I call the AI assessment gap.
Where the gap lives
Most technical assessments were built for a pre-AI world: Coding proficiency, algorithms, deterministic system design. These are skills tests. They confirm that an engineer can do the work. But they don’t tell you whether that engineer has the technical taste to make the right decisions when building, scaling or deploying AI systems in production.
In conversations with enterprise engineering leaders, we’re hearing that candidates are now running AI agents during live interviews, getting textbook-perfect answers fed to them in real time. If your assessment can be passed by an AI whispering in someone’s ear, it was never testing for the right thing. Skills can be faked or augmented. Taste can’t.
To see what this looks like, consider this scenario: An enterprise needs someone with deep experience in a specific data platform. A candidate passes the data engineering assessment. They get to the client interview, and the hiring manager says: ‘Tell me about a time you had to make a hard tradeoff in designing a streaming architecture.’ The candidate defines every concept involved. They don’t have the taste to explain why one approach would be dramatically better than another for a specific context. They’re out.
This happens because most assessment pipelines only test for skills: Can they code, understand the fundamentals? Nobody is systematically testing for technical taste: Can this person make better-than-default decisions about architecture, tooling and approach? That question only surfaces once someone with real experience asks it. By then, everyone has wasted time and the role is still open.
Traditional job postings compound the problem by filtering for ‘5+ years of AI experience,’ which screens out strong candidates because the category itself is only a few years old. What matters at the AI layer isn’t tenure. It’s the depth and specificity of what someone has built, deployed or scaled in production. Meanwhile, seniority at the foundational role level still matters in the ways it always has: A senior engineer brings architectural judgment that can’t be shortcut. The mistake is applying years-of-experience filters to the AI layers, where the work hasn’t existed long enough for tenure to be a meaningful signal.
One of the most telling signals of a broken assessment process: When stakeholders simultaneously complain that assessments are too hard and too easy. That’s not a calibration problem. It means the assessments aren’t measuring the right things in the first place. They’re testing for skills when they should be testing for taste.
Start with the work, not the title
To close the AI assessment gap, decompose the problem before you assess and decompose the need across the dimensions that actually determine whether someone can do the job. For example:
| Dimension | The Question | What It Determines | How You Evaluate |
| Role | What technical domain does the work live in? | Foundational skills and stack (e.g., backend engineer, Python, PostgreSQL) | Skills assessment: Project-based or simulation-based filter that confirms core engineering competency |
| Seniority | What level of judgment and autonomy does this work require? | Engineering maturity, depth of technical taste, ability to operate under ambiguity | Experience depth at the role level: Years of practice in the domain, complexity of systems designed and shipped |
| AI Engagement Pattern | How will this person engage with AI systems? | The specific technical taste required (e.g., Prototyper needs taste for sensing value; Builder needs taste for architecture and integration decisions; Scaler needs taste for performance, governance and risk tradeoffs) | Applied assessments: Not ‘define RAG’ but ‘given this use case, which approach would you choose and why?’ Testing for tradeoff reasoning, not terminology |
This decomposition replaces the single job description with a structured picture of what you actually need. It also immediately reveals whether you’re looking for one person or a team. If the project requires rapid prototyping to find value and then a production build, you probably need engineers with different profiles–not one ‘AI engineer’ who’s supposed to do both.
Three things most enterprises get wrong
- They test for skills when they should test for taste. Most assessments confirm that an engineer can write code and define concepts. They don’t test whether that engineer can make the architectural and tooling decisions that actually determine project success. An engineer who knows what agentic search is and an engineer who knows when agentic search is the right choice for a specific problem are two completely different hires. The first passes your skills test. The second delivers in production.
- They conflate skills with experience. A skills assessment tells you if someone can do the work. An experience validation tells you if someone has done the work in the specific context the job demands. These require completely different evaluation methods. When companies try to test both with a single instrument, they get the ‘too hard and too easy’ paradox: The assessment is simultaneously screening out competent people and letting through candidates who can’t perform. Seniority and years of experience are meaningful at the role level, where 10 years of backend engineering builds real architectural judgment and compounds technical taste. They’re much less meaningful at the AI engagement layer, where the work itself is only a few years old and depth of hands-on exposure matters more than calendar time.
- They treat assessment as a snapshot. The traditional model is a one-time gate: Pass or fail, in or out. In an AI world where skills are evolving monthly, that approach breaks down fast. Six months ago, almost nobody was shipping production code with agentic tools like Claude Code. Model Context Protocol, which lets AI systems plug into enterprise tools and data sources, was barely on anyone’s radar. Now enterprises are hiring for these skills specifically. Six months from now, the list will change again.
That means an assessment built in January is already partially stale by June. Companies that treat assessment as a living system, continuously updated by performance signals from real engagements, will consistently field better talent than those running the same tests they built a year ago.
The reskilling imperative
The reality is, there is no way to close this gap through hiring alone. The supply of engineers who already have the technical taste for AI work is a tiny fraction of what the market demands. For example, since the launch of ChatGPT in 2022, demand for roles that require more analytical, technical or creative work has increased by 20%.
Which means enterprises have to reskill and upskill existing workforces. And without a targeted approach mapped to actual needs, AI upskilling efforts often fail, leaving employees unsupported and initiatives stalled.
This is where the multi-dimensional model pays off beyond hiring. The same framework that powers your talent acquisition also powers your training strategy. Assessment results don’t just filter candidates in or out. They generate a heat map of where your workforce is strong and where it’s thin, across every dimension: Role competency, seniority depth and the specific technical taste required for prototyping, building or scaling AI systems. That heat map becomes your training roadmap.
Most companies skip this entirely and jump straight to ‘let’s buy an AI training program.’ Without that foundation, even the best training program is solving the wrong problem.
Ever ready
In the world of AI, the most critical skill might be knowing that you don’t and can’t possibly know everything. Or even what’s coming next. The roles we need today will look different in six months. The skills taxonomies we build now will need constant revision. The assessments we deploy this quarter will need recalibration by next quarter.
Companies that accept this reality and build nimble, multi-dimensional approaches to talent assessment will find something valuable: The technical taste they need already exists in their workforce, hiding behind outdated job descriptions and misaligned tests. CIOs must actively audit these descriptions to eliminate the traditional experience filters that mask the latent talent already sitting on their teams. The others will keep posting for ‘AI engineers’ and wondering why nobody who gets hired can actually do the job.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Read More from This Article: The AI assessment gap: Why your hiring process can’t find the talent you need
Source: News

