Generative AI and US copyright law are on a collision course

The way in which US copyright law interacts with the burgeoning field of generative AI remains nebulous, and could cause problems for generative AI training that will come to a head in court cases.

Senior Writer, Computerworld |

ai artificial intelligence law copyright legal — Shutterstock

US copyright law has yet to catch up to the numerous challenges posed by the wildfire adoption of generative AI, and the first real movement on major legal questions is likely to come from upcoming trials.

One of those key questions is the copyright status of works generated by artificial intelligence — for example, whether a Midjourney image that the system generated in response to a user prompt is protected by copyright, and, if so, who owns that copyright.

No one owns copyrights on AI-generated work

The answer, for the moment, is that nobody owns the copyright on AI-generated works, because nobody can. The US Copyright Office has taken the position that human authorship is required for copyright to exist on a work, and that no such authorship exists in the case of AI-created writing or images — neither the creator of the AI nor the provider of the prompt used to generate a particular work “owns” that output.

That’s unlikely to change due to legislation or administrative action in the near future, according to Ron Lazebnik, clinical associate professor at Fordham University School of Law. Litigation, however, could challenge that standard, he noted.

“Either someone tries to register something with the copyright office, who says no, and then they sue challenging that denial, or alternatively, someone who may have used an AI but never let the copyright office know that, they become the plaintiff,” Lazebnik said. “Beyond that, it’s not clear on how we would be able to get a court to say whether or not an AI generated work is attributed to the person who prompted the AI.”

It's within the Copyright Office’s purview to change that rule, according to Philippa Loengard, executive director of the Kernochan Center for Law, Media and the Arts at Columbia Law School — but she warned that such action remains unlikely.

“My guess is that AI regulation won’t start with copyright,” she said. “I think, and I could be wrong, that there are so many issues being discussed in the realm of AI that I’m not sure that changing the requirements of human authorship is going to be of paramount importance.”

The larger issue for AI in terms of copyright law is likely to be the concept of fair use, particularly as it applies to the training data used to create the large language models (LLMs) underpinning generative AI.

Fair use, in brief, is a defense to copyright claims written into federal law. The four factors that courts have to consider when deciding whether a particular use of copyright material without permission is “fair use,” are the character and purpose of the use (educational or other not-for-profit use is much more likely to be deemed fair than commercial use), the nature of the original work, the amount of the original work used, and the market effect on the original work.

Copyright a stumbling block for AI model training

Given those factors, it’s perhaps unsurprising that the lawsuits against companies like OpenAI have already begun. Most notably, a group of authors that includes comedian and writer Sarah Silverman sued OpenAI and Meta in July over the company’s use of their books to train ChatGPT.

The core issue in that lawsuit is the use of a data set called “BookCorpus,” which, the plaintiffs say, contained their copyright material. OpenAI and Meta are likely to argue that the market effect on Silverman’s and others works is negligible, and that the “character and purpose” of the use is different than that which prompted the writing of the books in the first place, while the plaintiffs are likely to highlight the for-profit nature of Meta and OpenAI’s use, as well as the use of entire works in training data.

Precedent, however, may be on the AI companies’ side — the Google Books case, which was a fair use action brought by the Author’s Guild of America against Google’s mass book digitization project in 2005. The case’s history is complicated, including appeals went on for a decade, and was ultimately settled in Google’s favor.

Whether that’s likely to be predictive, however, is debatable, according to Loengard, and much could depend on a judge’s willingness to challenge a large, profitable industry.

“By the time it ended, Google Books had become a tool of many researchers,” she said. “So there’s this idea that the cat’s out of the bag — and of course, the court wouldn’t say this out loud, and I’m not saying this is what they did, but they could look at it and say that once something has entered mainstream commerce that it’s harder to reel it back in and regulate it.”

Obviously derivative works could be another copyright battleground for the AI industry, given that the technology has already been used to produce convincing imitations of popular singers and songwriters. The right of publicity — a different legal concept covering the rights to a person’s name, image and likeness, could become a cause of action for the performance itself — i.e., the sound of Taylor Swift’s voice. But copyright could still become an issue if the underlying song is sufficiently similar to one written by Swift.

“If it’s a song close to what those artists might be expecting to produce, [copyright law could be implicated], in theory, depending on how close AI comes to writing a song that resembles one that that artist could write,” said Lazebnik.

The unsettled state of play around AI and copyright law is not simply a US phenomenon, although most countries have yet to pass detailed legislation around it. The EU’s AI Act, as well as frameworks for more general AI regulation passed by the US and China, do not change the confused state of play around copyright issues. One country that has done so is Japan, which clarified in June that the use of copyright works for AI training is permitted, even for commercial purposes.

But active regulation of these issues may still be far off in the US, according to the experts.

“The legislature hasn’t created a new carve-out in copyright in quite some time,” said Lazebnik. “If they think fair use isn’t sufficient and that it’s something they want to encourage, they might amend our laws, but at the moment, they would have to see an appellate court saying ‘no, this isn’t fair use.’”

“[Congress] has no pending legislation at all,” noted Loengard. “Everyone is studying this, nobody has come to any conclusions about it.”

Jon Gold covers IoT and wireless networking for Network World.

It’s time to break the ChatGPT habit