Publisher Opt-Outs Affected Half of Google’s A.I. Training bloomberg.com

Davey Alba, Bloomberg:

Google can train its search-specific AI products, like AI Overviews, on content across the web even when the publishers have chosen to opt out of training Google’s AI products, a vice-president of product at the company testified in court on Friday.

That’s because Google’s controls for publishers to opt out of AI training only cover work by Google DeepMind, the company’s AI lab, and not any other organization at the company, said Eli Collins, a Google DeepMind vice-president.

This confirms reporting by Alba and Julia Love last year: if publishers want to appear in Google Search, they have to be okay with some amount of A.I. training. If they had a choice, however, it seems unlikely to me they would take it. In court, the Department of Justice showed Collins a document regarding Gemini training:

According to that document, Google removed 80 billion of 160 billion “tokens” — snippets of content — after filtering out the material that publishers had opted out of allowing Google to use for training its AI. The document also listed search “sessions data,” or data collected during a period of time in which a user interacted with Google Search, as well as YouTube videos, as data that could augment Google’s AI models.

Half. Half of the data Google uses to train its A.I. models was removed when publishers were made aware they could opt out. That does not mean the other half have affirmatively opted in, of course, but it means at least half of publishers do not approve of Google’s desire to absorb their corpus of information without payment and with scant credit.