Jonathan Zittrain, the Atlantic:
This absence of central control, or even easy central monitoring, has long been celebrated as an instrument of grassroots democracy and freedom. It’s not trivial to censor a network as organic and decentralized as the internet. But more recently, these features have been understood to facilitate vectors for individual harassment and societal destabilization, with no easy gating points through which to remove or label malicious work not under the umbrellas of the major social-media platforms, or to quickly identify their sources. While both assessments have power to them, they each gloss over a key feature of the distributed web and internet: Their designs naturally create gaps of responsibility for maintaining valuable content that others rely on. Links work seamlessly until they don’t. And as tangible counterparts to online work fade, these gaps represent actual holes in humanity’s knowledge.
This article is not solely about link rot — though that is a significant component; instead, it is about the unique qualities of electronic resources that lend themselves to poor suitability for long-term reference and archiving. I wanted to highlight one example Zittrain cites.
Philip Howard bought a copy of “War and Peace” on his Nook in 2012:
As I was reading, I came across this sentence: “It was as if a light had been Nookd in a carved and painted lantern…” Thinking this was simply a glitch in the software, I ignored the intrusive word and continued reading. Some pages later I encountered the rogue word again. With my third encounter I decided to retrieve my hard cover book and find the original (well, the translated) text.
For the sentence above I discovered this genuine translation: “It was as if a light had been kindled in a carved and painted lantern…”
Imagine if, in a hundred years’ time, the version of “War and Peace” that was being read in schools was the former, and then someone discovered that “Nookd” was a mistranslation because someone had lazily done a find-and-replace to substitute trademarked product names.
John Bowers, Clare Stanton, and Jonathan Zittrain, writing in May for Columbia Journalism Review:
We found that of the 553,693 articles within the purview of our study — meaning they included URLs on nytimes.com — there were a total of 2,283,445 hyperlinks pointing to content outside of nytimes.com. Seventy-two percent of those were “deep links” with a path to a specific page, such as example.com/article, which is where we focused our analysis (as opposed to simply example.com, which composed the rest of the data set).
Of these deep links, 25 percent of all links were completely inaccessible. Linkrot became more common over time: 6 percent of links from 2018 had rotted, as compared to 43 percent of links from 2008 and 72 percent of links from 1998. Fifty-three percent of all articles that contained deep links had at least one rotted link.
This was a sample data set from 1996 through mid-2019, but maybe the most shocking number is the 2018 one: after just a year, one in every sixteen links from the Times’ website to an external source had stopped working. The Times already has an attribution problem; this just makes it worse. The researchers point out that URLs within U.S. Supreme Court opinions fare even worse, with about half of links not working as originally intended.
Zittrain and colleagues created Perma.cc to try to solve this problem, particularly for legal and scholarly users. It is a good, necessary effort that uses the Internet Archive’s engine to build permanent links to pages that Perma.cc promises are, indeed, permanent.
But while you would think all permalinks on the web would be permanent, just like you might think permafrost would never thaw — the “perma” is a pretty big clue — but it turns out that language is funny like that.
In my testing, Perma.cc worked fine for text-based pages, but failed to capture video files on YouTube. That is broadly the case with other archival methods; there simply is not a large-scale effective YouTube mirror. If something is removed from the world’s most popular general-purpose video hosting site, it may be lost forever.