Pixel Envy

Written by Nick Heer.

Differential Privacy

During the keynote today, Craig Federighi explained how Apple was attempting to bridge the gap between attempting to collect as little personal information as possible, while still creating software that takes advantage of ongoing trends and larger data sets. Federighi phrased it “differential privacy”, which isn’t a made-up thing at all: it’s a specific and defined field of study regarding the collection and fuzzing of data in larger sets. The intent is that it can provide a sense of larger trends — a newly-coined word or phrase, for instance, or traffic patterns in an area, or potentially voice patterns for regional accents — but there’s no useful data on any one person.

Andy Greenberg, Wired:

Differential privacy, [Aaron] Roth explains, seeks to mathematically prove that a certain form of data analysis can’t reveal anything about an individual — that the output of an algorithm remains identical with and without the input containing any given person’s private data. “You might do something more clever than the people before to anonymize your data set, but someone more clever than you might come around tomorrow and de-anonymize it,” says Roth. “Differential privacy, because it has a provable guarantee, breaks that loop. It’s future proof.”

Roth is the professor that Federighi gave a shout-out to in the presentation.

Ideally, what this means is that Apple can mine users’ devices for larger amounts of information to analyze and help improve their services, without it being invasive. It’s less about strip-mining data, and more about carefully retrieving it, and it’s the kind of thing that Apple can do because they have an established record of looking out for users’ privacy. More importantly, it’s the kind of thing that Apple is doing because they continue to respect users’ privacy.

But, from what I’m reading, this doesn’t necessarily allow them to provide better individualized services on its own. Data collected en masse must be married to individual user data. For iOS, this sounds like it occurs on the device itself. Any company can collect a bunch of data; the hard part is making meaning of it. I hope that’s successful here.