Kate Knibbs, Wired:
As artists are quick to point out, Meta’s insistence that people provide evidence that its models have trained on their work or other personal data puts them in a bind. Meta has not disclosed the specifics about which data it has trained its models on, so this set-up requires people who want to remove their information to first figure out which prompts might elicit responses that include details about themselves or their work.
Even if they do submit evidence, it may not matter. When asked about mounting frustration with this process, Meta responded that the data deletion request form is not an opt-out tool, emphasizing that it has no intention of deleting information found within its own platforms. “I think there is some confusion about what that form is and the controls we offer,” Meta spokesperson Thomas Richards told WIRED via email. “We don’t currently offer a feature for people to opt-out of their information from our products and services being used to train our AI models.”
Meta also has limited exclusion options for data scraped from the open web.
I mostly agree with arguments that using public web data for machine learning purposes is a fair use of available data. That is, not necessarily “fair use” in a legal definition — though that is being tested in court — but in a more common understanding that it seems okay to absorb and interpret websites and images for these models. That is not necessarily a comfortable position. These are commercial models scooping up information at vast scale, transforming it, and reselling products created from it without credit.
But intellectual property is a weak argument against technology which combines sources and synthesizes something new. Most anyone who makes a new creative product has probably relied on things protected by copyright at some point in that process, whether as inspiration or direct source material. The problem, as I see it, is relying on a legal and moral flexibility which protects something created by people like you or I, and extending those courtesies to giant, well-funded commercial entities.
One simple way to counteract this power imbalance is to give people an opt-in or opt-out choice. This should be entirely uncontroversial. Anyone should be able to say that they do not want their material used to train models, and the use of a product or service should not be contingent on whether users agree to provide training data. It should not be in the same contract. Meta is famously disrespectful to its users. It is not surprising that has continued in its latest efforts.