Coding assistants have been an obvious early use case in the generative AI gold rush, but promised productivity improvements are falling short of the mark — if they exist at all.
Many developers say AI coding assistants make them more productive, but a recent study set forth to measure their output and found no significant gains. Use of GitHub Copilot also introduced 41% more bugs, according to the study from Uplevel, a company providing insights from coding and collaboration data.
The study measured pull request (PR) cycle time, or the time to merge code into a repository, and PR throughput, the number of pull requests merged. It found no significant improvements for developers using Copilot.
Uplevel, using data generated by its customers, compared the output of about 800 developers using GitHub Copilot over a three-month period to their output in a three-month period before adoption.
Measuring burnout
In addition to measuring productivity, the Uplevel study looked at factors in developer burnout, and it found that GitHub Copilot hasn’t helped there, either. The amount of working time spent outside of standard hours decreased for both the control group and the test group using the coding tool, but it decreased more when the developers weren’t using Copilot.
Uplevel’s study was driven by curiosity over claims of major productivity gains as AI coding assistants become ubiquitous, says Matt Hoffman, product manager and data analyst at the company. A GitHub survey published in August found that 97% of software engineers, developers, and programmers reported using AI coding assistants.
“We’ve seen different studies of people saying, ‘This is really helpful for our productivity,’” he says. “We’ve also seen some people saying, ‘You know what? I’m kind of having to be more of a [code] reviewer.’”
A representative of GitHub Copilot didn’t have a comment on the study, but pointed to a recent study saying developers were able to write code 55% faster using the coding assistant.
The Uplevel team also went into its study expecting to see some productivity gains, Hoffman says.
“Our team’s hypothesis was that we thought that PR cycle time would decrease,” Hoffman says. “We thought that they would be able to write more code, and we actually thought that defect rate might go down because you’re using these gen AI tools to help you review your code before you even get it out there.”
Hoffman acknowledges there may be more ways to measure developer productivity than PR cycle time and PR throughput, but Uplevel sees those metrics as a solid measure of developer output.
Check back later
In addition, Uplevel isn’t suggesting that organizations stop using coding assistants, because the tools are advancing rapidly.
“We heard that people are ending up being more reviewers for this code than in the past, and you might have some false faith that the code is doing what you expect it to,” Hoffman adds. “You just have to keep a close eye on what is being generated; does it do the thing that you’re expecting it to do?”
In the trenches, development teams are reporting mixed results.
Developers at Gehtsoft USA, a custom software development firm, haven’t seen major productivity gains with coding assistants based on large language model (LLM) AIs, says Ivan Gekht, CEO of the company. Gehtsoft has been testing coding assistants in sandbox environments but has not used them with customer projects yet.
“Using LLMs to improve your productivity requires both the LLM to be competitive with an actual human in its abilities and the actual user to know how to use the LLM most efficiently,” he says. “The LLM does not possess critical thinking, self-awareness, or the ability to think.”
There’s a difference between writing a few lines of code and full-fledged software development, Gekht adds. Coding is like writing a sentence, while development is like writing a novel, he suggests.
“Software development is 90% brain function — understanding the requirements, designing the system, and considering limitations and restrictions,” he adds. “Converting all this knowledge and understanding into actual code is a simpler part of the job.”
Like the Uplevel study, Gekht also sees AI assistants introducing errors in code. Each new iteration of the AI-generated code ends up being less consistent when different parts of the code are developed using different prompts.
“It becomes increasingly more challenging to understand and debug the AI-generated code, and troubleshooting becomes so resource-intensive that it is easier to rewrite the code from scratch than fix it,” he says.
Seeing gains
The coding assistant experience at Innovative Solutions, a cloud services provider, is much different. The company is seeing significant productivity gains using coding assistants like Claude Dev and GitHub Copilot, says Travis Rehl, the CTO there. The company also uses a homegrown Anthropic integration to monitor pull requests and validate code quality.
Rehl has seen developer productivity increase by two to three times, based on the speed of developer tickets completed, the turnaround time on customer deliverables, and the quality of tickets, measured by the number of bugs in code.
Rehl’s team recently completed a customer project in 24 hours by using coding assistants, when the same project would have taken them about 30 days in the past, he says.
Still, some of the hype about coding assistants — such as suggestions they will replace entire dev teams rather than simply supplement or reshape them — is unrealistic, Rehl says. Coding assistants can be used to quickly sub out code or optimize code paths by reworking segments of code, he adds.
“Expectations around coding assistants should be tempered because they won’t write all the code or even all the correct code on the first attempt,” he says. “It is an iterative process that, when used correctly, enables a developer to increase the speed of their coding by two or three times.”
Read More from This Article: Devs gaining little (if anything) from AI coding assistants
Source: News