The Conclusions of That CAPTCHA Paper Seem Iffy to Me theregister.com

Thomas Claburn, the Register:

Google promotes its reCAPTCHA service as a security mechanism for websites, but researchers affiliated with the University of California, Irvine, argue it’s harvesting information while extracting human labor worth billions.

[…]

“Traffic resulting from reCAPTCHA consumed 134 petabytes of bandwidth, which translates into about 7.5 million kWhs of energy, corresponding to 7.5 million pounds of CO2. In addition, Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set.”

I have seen this paper (PDF) being passed around and, while I find its participant-reported data believable — people are much less satisfied with image-based CAPTCHA puzzles than checkboxes — these calculations are unbelievable.

To reiterate, the researchers are estimating reCAPTCHA sessions have, over the past thirteen years, been responsible for $888 billion of Google’s income. In that time, Google has made $1.8 trillion in revenue. These researchers are suggesting up to 49% of that can be directly tied to reCAPTCHA cookies.

Here is the explanation they give in the paper for how they arrived at that conclusion:

[…] According to Forbes [3], digital ad spending reached over $491 billion globally in 2021, and more than half of the market (51%) heavily relied on third-party cookies for advertisement strategies [1]. The expenditure on third-party audience data (collected using tracking cookies) in the United States reached from $15.9 billion in 2017 to $22 billion in 2021 [2]. More concretely, the current average value life-time of a cookie is €2.52 or $2.7 [58]. Given that there have been at least 329 billion reCAPTCHAv2 sessions, which created tracking cookies, that would put the estimated value of those cookies at $888 billion dollars.

It seems the researchers simply multiplied the total estimated number of reCAPTCHA sessions by a current value average to arrive at this number. I am probably missing some obvious flaws, but there are three I noticed. First, this calculation assumes cookies created thirteen years ago still exist today and have the same value, on average as any other cookie. Second, it assumes all sessions materialize in a unique individually valuable cookie. Lastly, it is unclear that a cookie’s value can be directly tied to Google’s income, as the researchers claim: “Google has potentially profited $888 billion from [reCAPTCHA] cookies”. None of these assumptions makes sense to me.