The Evitability of Copyright Changes Due to Generative ‘A.I.’ foreignpolicy.com

Will Oremus and Elahe Izadi, Washington Post:

AI systems are typically “trained” on gargantuan data sets that include vast amounts of published material, much of it copyrighted. Through this training, they come to recognize patterns in the arrangement of words and pixels, which they can then draw on to assemble plausible prose and images in response to just about any prompt.

Some AI enthusiasts view this process as a form of learning, not unlike an art student devouring books on Monet or a news junkie reading the Times cover-to-cover to develop their own expertise. But plaintiffs see a more quotidian process at work beneath these models’ hood: It’s a form of copying, and unauthorized copying at that.

David Karpf, in an opinion piece for Foreign Policy:

The story that I often hear from AI evangelists is that technologies such as ChatGPT are here, and they are inevitable. You can’t put this genie back in the bottle. If outdated copyright laws are at odds with the scraping behavior of large language models, then our copyright law will surely need to bend as a result.

And to them I can only say: Remember the Ghost of Napster. We do not live in the future that seemed certain during the Napster era. We need not live in the future that seems certain to AI evangelists today. […]

Karpf’s comparison to Napster is apt, but the lesson drawn from it seems wrong or, at least, incomplete. To avoid quoting paragraph after paragraph, here is my short summary: after Napster’s launch, many pundits assumed it would effectively end copyright, but they were wrong because after the RIAA made itself look like jerks by filing a series of lawsuits, iTunes and Spotify allow people a copyright-respecting way to get music online while artists get a little bit of income. Karpf writes correctly that “[c]opyright law did not bend to accommodate new technologies”, but laments — in the same paragraph — how this “new status quo hasn’t been great for musicians or artists”, and compares this to the liberal use of copyrighted works for generative training data. The inability for music piracy to establish a post-copyright legal business model, Karpf says, is an indication of possible commercial failure of this iteration of generative “A.I.” products since they could fall victim to the same copyright roadblock.

This feels so close but it is not the full story. Because streaming music services like Spotify show how compliance with the law is not necessarily lucrative for the creators of a work, what matters most is the effect forces like Napster have on changing our perception. Karpf quotes John Perry Barlow, from a 2000 Wired article, to support his thesis that many viewed copyright as good as dead in the wake of Napster. But I do not think Karpf fully acknowledged what that means; Barlow:

No law can be successfully imposed on a huge population that does not morally support it and possesses easy means for its invisible evasion.

Copyright itself may not have disappeared — quite the opposite thanks to term extensions passed in Canada and the E.U. — but one cannot deny its effects have been blunted. Piracy was popular before the internet, so of course it did not stop when Spotify launched; downloading of movies and television shows has been increasing. Copyright violations are extremely common among normal people to the extent that YouTube identifies background music and will allow rights-holders to monetize it.

All of this is to say there are options for large language models and machine learning which do not require wholesale rejection of copyright. That is what the legacy of Napster suggests. There may be, as Karpf intimates, a licensing scheme which would compensate those who own the intellectual property — though, it is worth pointing out, they are negotiating with publishers, not writers. Such a scheme would not necessarily be the death of ChatGPT. One significant difference between Napster and generative “A.I.” is that these are not underground efforts. Machine learning, on the other hand, is a deep pocketed move by powerful technology companies, and they are determined to make it work.

A grey market for models based on copyrighted works is a plausible parallel situation. It may not be able to be monetised in the same way as a commercial model, but it would make sense for people to build such a thing.