Dave Lee, Financial Times:
Today, the archive’s founder Brewster Kahle tells me, the project is on the brink of surpassing 100 petabytes – approximately 50,000 times larger than in 1997. It contains more than 700bn web pages.
The work isn’t getting any easier. Websites today are highly dynamic, changing with every refresh. Walled gardens like Facebook are a source of great frustration to Kahle, who worries that much of the political activity that has taken place on the platform could be lost to history if not properly captured. In the name of privacy and security, Facebook (and others) make scraping difficult.
An increasing amount of the internet’s activity is happening in video and audio formats, too, which require vastly greater amounts of space to preserve. The Internet Archive is no stranger to hosting that kind of material but expecting it to duplicate YouTube alone is a tall order.