How Much of the Internet Is Fake? ⇥ nymag.com

Max Read, writing for New York in one of my favourite pieces I’ve read all year:

How much of the internet is fake? Studies generally suggest that, year after year, less than 60 percent of web traffic is human; some years, according to some researchers, a healthy majority of it is bot. For a period of time in 2013, the Times reported this year, a full half of YouTube traffic was “bots masquerading as people,” a portion so high that employees feared an inflection point after which YouTube’s systems for detecting fraudulent traffic would begin to regard bot traffic as real and human traffic as fake. They called this hypothetical event “the Inversion.”

In the future, when I look back from the high-tech gamer jail in which President PewDiePie will have imprisoned me, I will remember 2018 as the year the internet passed the Inversion, not in some strict numerical sense, since bots already outnumber humans online more years than not, but in the perceptual sense. The internet has always played host in its dark corners to schools of catfish and embassies of Nigerian princes, but that darkness now pervades its every aspect: Everything that once seemed definitively and unquestionably real now seems slightly fake; everything that once seemed slightly fake now has the power and presence of the real. The “fakeness” of the post-Inversion internet is less a calculable falsehood and more a particular quality of experience — the uncanny sense that what you encounter online is not “real” but is also undeniably not “fake,” and indeed may be both at once, or in succession, as you turn it over in your head.

Aram Zucker-Scharff started a Twitter thread with some more indicators in the web on which you cannot rely: advertising, social media trends, readers, viewers, and more. If it’s a number that is important, you can bet that it is manipulated for a price.

Update: Ellen K. Pao on Twitter:

It’s all true: Everything is fake. Also mobile user counts are fake. No one has figured out how to count logged-out mobile users, as I learned at reddit. Every time someone switches cell towers, it looks like another user and inflates company user metrics.

The most alarming aspect of statistical fakery is not necessarily that it exists, but what will likely be done to combat it. Instead of admitting that these stats are likely to be manipulated and are, at best, wildly inaccurate estimates — and, therefore, that decisions should not be made based on what is reported — it is far more likely that this will lead to calls for more data collection. There will be attempts to make user identification more precise and more pervasive, particularly across devices.

This already happened with reCAPTCHA. Several years ago, the system required users to type the words in distorted scans of books. By 2014, however, computers did better than humans at the test. Since reCAPTCHA is owned by Google, they took advantage of the extent to which Google spies on web users to create a usually-invisible CAPTCHA. The more you use the web — and, in particular, Google properties — the less often you’ll be asked to manually verify your humanity. That may be more convenient, but it’s hard to deny how much creepier it is.

Of course, not even that has stopped people from trying to bypass it. Researchers have demonstrated loopholes in audio CAPTCHAs, have figured out technical workarounds, and have simply thrown people at the problem.