Anthropic Proposes Paying Authors $1.5 Billion Over Pirated Books Used in Training ⇥ arstechnica.com

Ashley Belanger, Ars Technica:

Authors revealed today that Anthropic agreed to pay $1.5 billion and destroy all copies of the books the AI company pirated to train its artificial intelligence models.

In a press release provided to Ars, the authors confirmed that the settlement is “believed to be the largest publicly reported recovery in the history of US copyright litigation.” Covering 500,000 works that Anthropic pirated for AI training, if a court approves the settlement, each author will receive $3,000 per work that Anthropic stole. “Depending on the number of claims submitted, the final figure per work could be higher,” the press release noted.

Foster Kamer, Futurism:

Kyle Chayka, a staff writer at The New Yorker whose work zeroes in on the intersection between technology, art, and culture, is the author of not one but two books that popped up in LibGen: 2024’s “Filterworld: How Algorithms Flattened Culture” and 2020’s “The Longing For Less: Living With Minimalism.” Also in found in LibGen was the Italian translation of Filterworld. All in, he could stand to make upwards of $12K!

We asked Kyle: How does the sum of “$3,000 per class work” feel as a number given that his intellectual property was used to train an AI? Low, high, not worth it on principle, or about right?

“It should be a license, really,” he replied. “Because the training never goes away. So it could be $5,000 every 5 years, or $1,000 / year as long as they exist. But the price seems about right, honestly — a decent percentage of most book advances, and about the price of an institutional speaking gig.”

Yet another complication for the fair use arguments of generative A.I. companies, though one which was obviously undermined by using pirated data to begin with. Though I think it makes sense to focus on this case for now, the question looming over this entire case is what precedent it sets. It does not singlehandedly eliminate the fair use argument for training on public information, but what about other illicitly reproduced information ingested into data sets?

Meta and now Apple are being sued in similar cases. If you are a published author, visit the case settlement website to register for your share of the pot.

Update: The judge in the case is not thrilled with this proposed settlement.