Tracking AI adoption in the enterprise presents IT leaders with a metrics dilemma. While ROI should be the arbiter of AI initiative success, ensuring employees actually use the AI tools you roll out is a key step in the journey toward that ROI.
So, what’s the best way to measure AI uptake without losing sight of the ultimate goal?
Some enterprises have adopted token usage as a metric to track adoption, going so far as to gamify AI interactions to encourage use. Some AI experts say that’s a dangerous approach.
Companies such as Amazon, JPMorgan, Meta, and Disney have reportedly deployed AI usage leaderboards to encourage adoption, in some cases prompting workers to rack up huge bills as they burn through token budgets. One Disney employee interacted with the Claude AI 460,000 times in a nine-day span, Business Insider reports.
Such company leaderboards have led to a phenomenon known as tokenmaxxing, with employees ramping up their use of AI tools with the aim of winning the competition. Tracking employee token usage alone, without marrying it with output or productivity metrics, is a recipe for disaster, especially for IT leaders responsible for AI budgets, several AI experts say.
In some cases, top token users at companies have reportedly spent millions of dollars.
Token use leaderboards come from good intentions, a genuine desire to track how employees are interacting with AI tools, says Trevor Stuart, senior vice president at software development support vendor Harness.
“They’re just trying to understand how people are using these tools, how many people are using these tools,” he says, adding that by encouraging adoption the leaderboards presumably will result in “downstream productivity.”
Token leaderboards, however, create incentives for employees to use AI tools without thinking about costs, with some using frontier AI models for simple tasks.
“It’s like using the wrong tool when you can use a simpler tool to get the job done,” he says. “That’s where tokenmaxxing really incentivizes the wrong behavior.”
Quick metrics
Measuring tokens used has become popular because it’s a relatively easy metric to collect, notes Todd Olson, CEO at AI analytics vendor Pendo.
“If someone’s spending zero tokens, they’re not using AI at all, and they’re not getting any value out of it,” he says. “But then, the lines become much more complex and much grayer once everyone actually starts using it.”
Once organizations get employees to take the first step toward using AI tools, they need to start thinking about other metrics, Olson says. “There’s the initial inertia of getting people to try something and change their habits,” he says. “That’s kind of a zero to one problem. But then, the question is, Are people using it just for the sake of using it?”
The big issue is that token use doesn’t necessarily lead to productivity, says Logan Wolfe, partner in the global enterprise transformation, AI, and sovereign tech strategy practice at Kyndryl.
“Companies are using the number of tokens consumed as a proxy for how productively employees are using AI,” he explains. “The employees are de facto incentivized for using tokens or, in some cases, punished for not using enough tokens, and obviously, it’s a metric that’s very easy to game.”
Wolfe compares token usage metrics to rewards for software developers who write the most lines of code, which leads to bloated applications.
“When token usage becomes the KPI, you incentivize output volume over outcomes like efficiency, quality, and risk reduction,” he adds.
One of the major traps for IT leaders is that token use incentives can break the budget, Wolfe says.
“Considering the fact that reductions on price per token and price per inference seem to be nowhere on the horizon these days, in no small part thanks to the rising energy cost crisis, this actually leads to an inverse curve of unit economics and ROI on AI initiatives,” he says.
Counting the wrong thing
Measuring token usage alone would be similar to a person tracking the number of miles they walk each day to improve their health without also counting the calories they consume or regularly checking their basic medical metrics, says Itamar Friedman, CEO of AI code review provider Qodo. If you walk two miles a day, but consume 5,000 calories, you’re unlikely to improve your health.
Keeping track of employee token usage isn’t a bad practice, but using it as a solitary metric gives companies an incomplete picture of the benefits of their AI deployments, he says.
“I do think there is a correlation between token maximizing and being more productive,” he says. “But the problem is that if you treat that alone as your single most important, and maybe even single, metric for productivity, you might actually create a vanity.”
In some cases, companies appear to be tracking the token usage of their programmers, he says. When developers are incentivized to spit out huge amounts of AI-generated code without quality and security reviews in place, the code can contain major bugs and security holes, he suggests.
More metrics needed
To avoid the pitfalls of tracking token usage, Harness’ Stuart recommends companies also establish metrics for productivity or output.
“You need to set it up in a way that you gamify for the behaviors and incentives that you care about,” he says. “Maybe an incentive for us here at Harness is, it’s not about the amount of tokens you consume, it’s about the output that we’re able to deliver and moving from inputs to outputs.”
Productivity metrics will vary from company to company, he notes. For developers using AI assistants, for example, the primary metric may not be the number of lines of code written, but instead the number of lines of code that made it into production.
“Did you spend money writing lines of code that were either rejected or moved back or not shipped to production?” he says. “I see this need to understand the wasted dollars. If you are going to have leaderboards, you also need to counteract that with some of the potential waste and bring that measure in.”
Companies can also track how employees are optimizing their AI use, he adds. “There’s optimizable dollars, and there’s wasted dollars, and then there’s the tokens consumed,” he says. “Beginning to think about all three of those together is really important. And the fourth vector is: What was the output? Did we get the code to production?”
Read More from This Article: Tokenmaxxing: When AI adoption metrics go bad
Source: News

