Separate Lawsuits Claim OpenAI and Perplexity Are Sharing User Data With Third Parties for Targeted Advertising ⇥ techpolicy.press

Madeline Batt, Tech Policy Press:

The recent lawsuit Noel v. Perplexity brought the question of AI monetization onto a courthouse docket. Since voluntarily dismissed by the plaintiff, the details of the class action provided a window into how adtech in AI is likely to be challenged in the courts.

The lawsuit targeted generative AI company Perplexity, along with Meta and Google, alleging they disclosed transcripts of users’ conversations with chatbots for targeted advertising. […]

It is not clear to me why the anonymous plaintiff gave up on this case. Abandoning the suit does not necessarily mean its claims are unfounded.

Maggie Harrison Dupré, Futurism:

A new class action lawsuit accuses OpenAI of sharing data including user chat queries and personal identifying information like emails and user IDs with the tech giants — and targeted advertising behemoths — Meta and Google, without obtaining proper user consent.

Interestingly, the Office of the Privacy Commissioner of Canada recently concluded an investigation of OpenAI’s training on personal information and whether it can produce that information reliably. It seems to me like questions about third-party ad targeting were out of scope. This is notable, however:

OpenAI represented that ‘untraining’ or ‘reverse-training’ LLMs, so that they no longer use or generate specific personal information for which a deletion request has been submitted, is not currently feasible. OpenAI explained that this is because its models are trained through repeated adjustments of billions of weights (parameters) over successive runs of training datasets and do not contain or store copies of information that they ‘learned’ from.

I think we all knew this was the case, but it underscores the questionable effectiveness of robots.txt rules for website owners wishing to opt out of being a source for LLM training. It is not even clear OpenAI, for example, ensures data in its collection remains in compliance with opt-out requests when training new models.