The Reddit and Google Pairing Is One of a Kind

Since owners of web properties became aware of the traffic-sending power of search engines — most often Google in most places — they have been in an increasingly uncomfortable relationship as search moves beyond ten relevant links on a page. Google does not need websites, per se; it needs the information they provide. Its business recommendations are powered in part by reviews on other websites. Answers to questions appear in snippets, sourced to other websites, without the user needing to click away.

Publishers and other website owners might consider this a bad deal. They feed Google all this information hoping someone will visit their website, but Google is adding features that make it less likely they will do so. Unless they were willing to risk losing all their Google search traffic, there was little a publisher could do. Individually, they needed Google more than Google needed them.

But that has not been quite as true for Reddit. Its discussions hold a uniquely large corpus of suggestions and information on specific topics and in hyper-local contexts, as well as a whole lot of trash. While the quality of Google’s results have been sliding, searchers discovered they could append “Reddit” to a query to find what they were looking for.

Google realized this and, earlier this year, signed a $60 million deal with Reddit allowing it to scrape the site to train its A.I. features. Part of that deal apparently involved indexing pages in search as, last month, Reddit restricted that capability to Google. That is: if you want to search Reddit, you can either use the site’s internal search engine, or you can use Google. Other search engines still display results created from before mid-July, according to 404 Media, but only Google is permitted to crawl anything newer.

It is unclear to me whether this is a deal only available to Google, or if it is open to any search engine that wants to pay. Even if it was intended to be exclusive, I have a feeling it might not be for much longer. But it seems like something Reddit would only care about doing with Google because other search engines basically do not matter in the United States or worldwide.1 What amount of money do you think Microsoft would need to pay for Bing to be the sole permitted crawler of Reddit in exchange for traffic from its measly market share? I bet it is a lot more than $60 million.

Maybe that is one reason this agreement feels uncomfortable to me. Search engines are marketed as finding results across the entire web but, of course, that is not true: they most often obey rules declared in robots.txt files, but they also do not necessarily index everything they are able to, either. These are not explicit limitations. Yet it feels like it violates the premise of a search engine to say that it will be allowed to crawl and link to other webpages. The whole thing about the web is that the links are free. There is no guarantee the actual page will be freely accessible, but the link itself is not restricted. It is the central problem with link tax laws, and this pay-to-index scheme is similarly restrictive.

This is, of course, not the first time there has been tension in how a site balances search engine visibility and its own goals. Publishers have, for years, weighed their desire to be found by readers against login requirements and paywalls — guided by the overwhelming influence of Google.

Google used to require publishers provide free articles to be indexed by the search engine but, in 2017, it replaced that with a model that is more flexible for publishers. Instead of forcing a certain number of free page views, publishers are now able to provide Google with indexable data.

Then there are partnerships struck by search engines and third parties to obtain specific kinds of data. These were summarized well in the recent United States v. Google decision (PDF), and they are probably closest in spirit to this Reddit deal:

GSEs enter into data-sharing agreements with partners (usually specialized vertical providers) to obtain structured data for use in verticals. Tr. at 9148:2-5 (Holden) (“[W]e started to gather what we would call structured data, where you need to enter into relationships with partners to gather this data that’s not generally available on the web. It can’t be crawled.”). These agreements can take various forms. The GSE might offer traffic to the provider in exchange for information (i.e., data-for-traffic agreements), pay the provider revenue share, or simply compensate the provider for the information. Id. at 6181:7-18 (Barrett-Bowen).

As of 2020, Microsoft has partnered with more than 100 providers to obtain structured data, and those partners include information sources like Fandango, Glassdoor, IMDb, Pinterest, Spotify, and more. DX1305 at .004, 018–.028; accord Tr. at 6212:23–6215:10 (Barrett-Bowen) (agreeing that Microsoft partners with over 70 providers of travel and local information, including the biggest players in the space).

The government attorneys said Bing is required to pay for structured data owing to its smaller size, while Google is able to obtain structured data for free because it sends partners so much traffic. The judge ultimately rejected their argument Microsoft struggled to sign these agreements or it was impeded in doing so, but did not dispute the difference in negotiating power between the two companies.

Once more, for emphasis: Google usually gets structured data for free but, in this case, it agreed to pay $60 million; imagine how much it would cost Bing.

This agreement does feel pretty unique, though. It is hard for me to imagine many other websites with the kind of specific knowledge found aplenty on Reddit. It is a centralized version of the bulletin boards of the early 2000s for such a wide variety of interests and topics. It is such a vast user base that, while it cannot ignore Google referrals, it is not necessarily reliant on them in the same way as many other websites are.

Most other popular websites are insular social networks; Instagram and TikTok are not relying on Google referrals. Wikipedia would probably be the best comparison to Reddit in terms of the contribution it makes to the web — even greater, I think — but every article page I tried except the homepage is overwhelmingly dependent on external search engine traffic.

Meanwhile, pretty much everyone else still has to pay Google for visitors. They have to buy the ads sitting atop organic search results. They have to buy ads on maps, on shopping carousels, on videos. People who operate websites hope they will get free clicks, but many of them know they will have to pay for some of them, even though Google will happily lift and summarize their work without compensation.

I cannot think of any other web property which has this kind of leverage over Google. While this feels like a violation of the ideals and principles that have built the open web on which Google has built its empire, I wonder if Google will make many similar agreements, if any. I doubt it — at least for now. This feels funny; maybe that is why it is so unique, and why it is not worth being too troubled by it.


  1. The uptick of Bing in the worldwide chart appears to be, in part, thanks to a growing share in China. Its market share has also grown a little in Africa and South America, but only by tiny amounts. However, Reddit is blocked in China, so a deal does not seem particularly attractive to either party. ↥︎