How Crawlers Impact the Operations of the Wikimedia Projects ⇥ diff.wikimedia.org

Birgit Mueller, Chris Danis, and Giuseppe Lavagetto, of the Wikimedia Foundation:

Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%. This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models. Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs.

Given the sheer volume of stuff scraped by A.I. companies, it is hard to say how much value any single source has in generating material in response to an arbitrary request. Wikimedia might be the exception, however. It is so central and its contents so expansive that it is hard to imagine many of these products would be nearly so successful without it.

I do not see the names of any of the most well-known A.I. companies among the foundation’s largest donors. Perhaps they are the seven anonymous donors in the $50,000-and-up group. I suggest they, at the very least, give more generously and openly.