AI coding agents are poised to take over a large chunk of software development in coming years, but the change will come with intellectual property legal risk, some lawyers say.
AI-powered coding agents will be a step forward from the AI-based coding assistants, or copilots, used now by many programmers to write snippets of code. But as coding agents potentially write more software and take work away from junior developers, organizations will need to monitor the output of their robot coders, according to tech-savvy lawyers.
Media outlets and entertainers have already filed several AI copyright cases in US courts, with plaintiffs accusing AI vendors of using their material to train AI models or copying their material in outputs, notes Jeffrey Gluck, a lawyer at IP-focused law firm Panitch Schwarze.
The same thing could happen with software code, even though companies don’t typically share their source code, he says.
“Does the output infringe something that someone else has done?” he posits. “The more likely the AI was trained using an author’s work as training data, the more likely it is that the output is going to look like that data.”
How was the AI trained?
Beyond the possibility of AI coding agents copying lines of code, courts will have to decide whether AI vendors can use material protected by copyright — including some software code — to train their AI models, Gluck says.
“At the level of the large language model, you already have a copyright issue that has not yet been resolved,” he says.
The legal issues aren’t likely to go away anytime soon, adds Michael Word, an IP and IT-focused lawyer at the Dykema Gossett law firm.
“We’re already seeing the ability to use AI in the background, essentially, to draft significant portions of code,” he says. “You have user interfaces that say, ‘I want my application to do this,’ you hit the button, and the code gets generated in the background.”
Without some review of the AI-generated code, organizations may be exposed to lawsuits, he adds. “There’s a lot of work that’s going on behind the scenes there, getting beyond maybe just those individual snippets of code that may be borrowed,” he says. “Is that getting all borrowed from one source; are there multiple sources? You can maybe sense that there’s something going on there.”
While human-written code can also infringe on copyright or violate open-source licenses, the risk with AI-generated code is related to the data the AI is trained on, says Ilia Badeev, head of data science at Trevolution Group, a travel technology company. There’s a good chance many AI agents are trained on code protected by IP rights.
“This means the AI might spit out code that’s identical to proprietary code from its training data, which is a huge risk,” Badeev adds. “The same goes for open-source stuff. A lot of open-source programs are meant for non-commercial use only. When an AI generates code, it doesn’t know how that code will be used, so you might end up accidentally violating license terms.”
Vendors take action
GitHub Copilot, the popular AI coding assistant owned by Microsoft, acknowledges that it could, in rare cases, “match examples of code used to train GitHub’s AI model.” The coding assistant features an optional code referencing filter to detect and suppress suggestions that match public code, and it is previewing a code-referencing feature to assist users to find and review potentially relevant open-source licenses.
GitHub also has legal protections in place. “When users have the filter enabled that blocks matches to existing public code, they are covered by GitHub’s indemnification policy,” a spokeswoman says.
Tabnine, another vendor of an AI coding assistant, announced its own code review agent in late October. But GitHub Copilot and Tabnine are not the only coding assistants available, and GitHub notes that users are responsible for their own open-source licensing policies.
To protect themselves, organizations using AI coding agents will need to check AI-produced code for copyright infringement and open-source license violations, either by human programmers or by services that check software for intellectual property infringement, Dykema Gossett’s Word says.
Some AI code-generating platforms will “help protect you and shield you from some liability, or at least give you some comfort,” says the lawyer.
Other services are likely to emerge, Word says. “You can imagine that as these types of systems get used more and more, you have this type of service provider where you can upload your source code, and they’ll check it for open-source violations,” he says. “They will train on all the public data sets that are out there and will audit your code and see if there’s any potential copyright infringement complaints.”
Trevolution’s Badeev recommends companies using coding agents to check outputs the same way they audit human-generated code for IP violations. “You still need to apply all the same best practices you would with human-written code; things like code reviews are still super important,” he says. “You can’t just trust the AI to get everything right on its own.”
The risk is unclear
It’s unclear how much of a problem this will be for organizations that deploy coding agents. Panitch Schwarze’s Gluck suggests that large AI vendors may be bigger targets for copyright and other IP infringement lawsuits, but Word says there will be some risk for user organizations, especially when they use coding agents to make commercially available and successful software packages.
“You need to be aware of what your coders are doing,” Word says. “Your coders are going to be doing this because it is such a useful tool, and it’s hard to prevent this from happening.”
Companies need to take reasonable steps to prevent IP violations, he adds. “So the question is, do you turn a blind eye or do not put guardrails in place to try to address it, or do you actually be proactive in trying to address it?” he says.
Read More from This Article: AI coding agents come with legal risk
Source: News