What We Talk About When We Talk About Privacy

I would hate to begin any post here in the way that some first-year college student would start an essay: with a definition. But the meaning of “privacy” so variable that I invite you to see how different entities explain it. NIST has a few different explanations, while the NTIA has a much longer exploration. PC Magazine’s glossary entry is pretty good, too, and closely mimics Steve Jobs’ thesis.

So, with so much understanding of what privacy represents — at least in a kind of abstract sense — parts of this article by Benedict Evans come across as hollow even as it makes several great arguments. I’m going to start by quoting the second paragraph, because it begins “first”:

First, can we achieve the underlying economic aims of online advertising in a private way? Advertisers don’t necessarily want (or at least need) to know who you are as an individual. As Tim O’Reilly put it, data is sand, not oil – all this personal data actually only has value in the aggregate of millions. Advertisers don’t really want to know who you are – they want to show diaper ads to people who have babies, not to show them to people who don’t, and to have some sense of which ads drove half a million sales and which ads drove a million sales. […]

Already, I find myself wondering if Evans is being honest with himself. The argument that advertisers want to work in bulk more often than at the individual level is an outdated one in an era of ads that can be generated to uncanny specificity. Even conceding that Facebook’s influence on the 2016 election was overstated, the Trump campaign was “running 40,000 to 50,000 variants of its ads” every day. This ain’t the world of high-quality, thoughtful advertising — not any more. This is a numbers game: scaled individualization driven by constant feedback and iteration. If advertisers believe more personal information will make ads more effective, they will pursue that theory as far as they can take it.

Evans acknowledges that consumer demands and Apple’s industry influence have pushed the technology industry to try improving user privacy. On-device tracking systems are seen, he says, as a more private way of targeting advertising without exposing user data to third parties.

But:

This takes me to a second question – what counts as ‘private’, and how can you build ‘private’ systems if we don’t know?

Apple has pursued a very clear theory that analysis and tracking is private if it happens on your device and is not private if [it] leaves your device or happens in the cloud. Hence, it’s built a complex system of tracking and analysis on your iPhone, but is adamant that this is private because the data stays on the device. People have seemed to accept this (so far), but acting on the same theory Apple also created a CSAM scanning system that it thought was entirely private – ‘it only happens your device!’ – that created a huge privacy backlash, because a bunch of other people think that if your phone is scanning your photos, that isn’t ‘private’ at all. […]

I will get back to the first part of this quoted section at the end of this response because I think it is the most important thing in Evans’ entire piece.

For clarity, the backlash over CSAM scanning seems less about privacy than it does about device ownership and agency. This is, to some extent, perhaps a distinction without a difference. Many of the definitions I cited in the first paragraph describe privacy as a function of control. But I think there is a subtle point of clarity here: Apple’s solution probably is more private than checking those photos server-side, but it means that a user’s device is more than a mere client connected to cloud services — it is acting as a local agent of those services.

Continued from above:

[…] So is ‘on device’ private or not? […]

This feels like a trick question or a false premise, to which the only acceptable answer is “it depends”. In general, probably, but there are reasonable concerns about Google’s on-device FLoC initiative.

On / off device is one test, but another and much broader one is the first party / third party test: that it’s OK for a website to track what you do on that website but not OK for adtech companies to track you across many different websites. This is the core of the cookie question, and sounds sensible, and indeed one might think that we do have a pretty good consensus on ‘third party cookies’ – after all, Google and Apple are getting rid of them. However, I’m puzzled by some of the implications. “1p good / 3p bad” means that it’s OK for the New York Times to know that you read ten New York Times travel pieces and show you a travel ad, but not OK for the New Yorker to know that and show you the same ad. […]

This is where this piece starts to go off the rails. I have read the last sentence of this quoted paragraph several times and I cannot figure out if this is a legitimate question Evans is asking.

If we engage with it on its premise, of course it is not okay for the New Yorker to show an ad based on my Times browsing history. It is none of their business what I read elsewhere. It would be like if I went to a clothing store and then, later at a restaurant, a waiter told me that I should have bought the other shirt I tried on because they think it looked better. That would be creepy! And if any website could show me ads based on what I viewed somewhere else, that means that my web browsing history is public knowledge. It violates both the first- and third-party definition and the on- and off-device definition.

But the premise is wrong — or, at least, incomplete. The New Yorker contains empty frames that can be filled by whatever a series of unknown adtech companies decide is the best fit for me based on the slice of my browsing history they collect, like little spies with snippets of information. If it were a direct partnership to share advertising slots, at least we could imply that a reader of both may see them as similarly trustworthy organizations, given that they read both. But this is not a decision between the New Yorker and the Times. There may be a dozen other companies involved in selecting the ad, most of which a typical user has never heard of. How much do you, reader, trust Adara, Dataxu, GumGum, MadHive, Operative, SRAX, Strossle, TelMar, or Vertoz? I do not know if any of them have ever been involved in ad spots in the New Yorker or the Times, but they are all real companies that are really involved in placing ads across the web — and they are only a few names in a sea of thousands.

At this point one answer is to cut across all these questions and say that what really matters is whether you disclose whatever you’re doing and get consent. Steve Jobs liked this argument. But in practice, as we’ve discovered, ‘get consent’ means endless cookie pop-ups full of endless incomprehensible questions that no normal consumer should be expected to be understand, and that just train people to click ‘stop bothering me’. Meanwhile, Apple’s on-device tracking doesn’t ask for permission, and opts you in by default, because, of course, Apple thinks that if it’s on the device it’s private. Perhaps ‘consent’ is not a complete solution after all.

Evans references Jobs’ consent-based explanation of privacy that I cited at the top of this piece — a definition which, unsurprisingly, Apple continues to favour. But an over-dependency on a consent model offloads the responsibility for privacy onto individual users. At best, this allows the technology and advertising industries to distance themselves from their key role in protecting user privacy; at worst, it allows them to exploit whatever they are permitted to gather by whatever technical or legal means possible.

The Jobs definition of privacy and consent is right, but it becomes even more right if you expand its scope beyond the individual. As important as it is for users to confirm who is collecting their data and for what purpose, it is more important that there are limits on the use and distribution of collected information. This sea of data is simply too much to keep track of. Had you heard of any of the ad tech companies mentioned above? What about data brokers that trade and “enrich” personal information? Even if users affirm that they are okay with an app or a website tracking them, they may not be okay with how a service that app relies on ends up reselling or sharing user data.

Good legislation can restrict these industries. I am sure Canada’s is imperfect, but there has to be a reason why the data broker industry here is, thankfully, almost nonexistent compared to the industry in the United States.

But the bigger issue with consent is that it’s a walled garden, which takes me to a third question – competition. Most of the privacy proposals on the table are in absolute, direct conflict with most of the competition proposals on the table. If you can only analyse behaviour within one site but not across many sites, or make it much harder to do that, companies that have a big site where people spend lots of time have better targeting information and make more money from advertising. If you can only track behaviour across lots of different sites if you do it ‘privately’ on the device or in the browser, then the companies that control the device or the browser have much more control over that advertising (which is why the UK CMA is investigating FLoC).

With GDPR, we have seen the product of similarly well-intentioned privacy legislation that restricts the abilities of smaller companies while further entrenching the established positions of giants. I think regulators were well aware of that consequence, and it is a valid compromise position between where the law existed several years ago and where it ought to be going.

As regulations evolve, these competition problems deserve greater focus. It is no good if the biggest companies on the planet or those that are higher up the technology stack — like internet service providers — are able to use their position to abuse user privacy. To make sure smaller companies ever have a chance of competing, it would be a mistake loosen policies on privacy and data collection. Regulations must go in the other direction.

And, as an aside, if you can only target on context, not the user, then Hodinkee is fine but the Guardian’s next landmark piece on Kabul has no ad revenue. Is that what we want? What else might happen?

This is not a new problem for newspapers. Advertisers have always been worried that their ads will be placed alongside “hard news” stories. You can find endless listicles of examples — here’s one from Bored Panda. In order to avoid embarrassing associations, it is commonplace for print advertisers to ask for exceptions: a car company, for example, may request their ad not be placed alongside stories about collisions.

This has been replicated online at both ends of the ad buying market. The New York Times has special tags to limit or remove ads on some stories, while advertisers can construct lists of words and domains they want to avoid placement alongside. But what is new about online news compared to its print counterpart is that someone will go from the Guardian story about Kabul to Hodinkee without “buying” the rest of the Guardian, or even looking at it. This is a media-wide problem that has little to do with privacy-sensitive ad technologies. If serving individualized ads tailored based on a user’s browsing history were so incredible, you would imagine the news business would be doing far better than it is.

All of this leads to the final paragraph in Evans’ piece, which I think raises worthwhile questions:

These are all unresolved questions, and the more questions you ask the less clear things can become. I’ve barely touched on a whole other line of enquiry – of where all the world’s $600bn of annual ad spending would be reallocated when all of this has happened (no, not to newspapers, sadly). Apple clearly thinks that scanning for CSAM on the device is more private than the cloud, but a lot of other people think the opposite. You can see the same confusion in terms like ‘Facebook sells your data’ (which, of course, it doesn’t) or ‘surveillance capitalism’ – these are really just attempts to avoid the discussion by reframing it, and moving it to a place where we do know what we think, rather than engaging with the challenge and trying to work out an answer. I don’t have an answer either, of course, but that’s rather my point – I don’t think we even agree on the questions.

Regardless of whether we disagree on the questions or if you — as I — think that Evans is misstating concerns without fully engaging, I think he’s entirely right here. Questions about user privacy on the web are often flawed because of the expansive and technical nature of the discussion. We should start with simpler questions about what we hope to achieve, and fundamental statements what “privacy” really looks like. There should be at least some ground level agreement about what information is considered personal and confidential. At the very least, I would argue that this applies to data points like non-public email addresses, personal phone numbers, dates of birth, government identification numbers, and advertiser identifiers that are a proxy for an individual or a device.

But judging by the popularity of data enrichment companies, it does not appear that there is broad agreement that anything is private any more — certainly not among those in advertising technologies. The public is disillusioned and overwhelmed, and it is irresponsible to leave it to individuals to unpack this industry. There is no such thing as informed consent in marketing technologies when there is no corresponding legislation requiring the protection of collected data. These kinds of fundamental concerns must be addressed before moving on to more abstract questions about how the industry will cope.