Researchers Find ‘Anonymized’ Data Can Easily Be Combined With Leaked Personal Information From Major Data Breaches

Karl Bode, Vice:

Dasha Metropolitansky and Kian Attari, two students at the Harvard John A. Paulson School of Engineering and Applied Sciences, recently built a tool that combs through vast troves of consumer datasets exposed from breaches for a class paper they’ve yet to publish.

“The program takes in a list of personally identifiable information, such as a list of emails or usernames, and searches across the leaks for all the credential data it can find for each person,” Attari said in a press release.

They told Motherboard their tool analyzed thousands of datasets from data scandals ranging from the 2015 hack of Experian, to the hacks and breaches that have plagued services from MyHeritage to porn websites. Despite many of these datasets containing “anonymized” data, the students say that identifying actual users wasn’t all that difficult.

There is probably no reason why a massive company would want to identify an individual person, but there is every reason why they would want to find many individual persons. Advertising, marketing, and analytics companies routinely flout the difference between targeting users in very small groups and targeting them individually. That’s gross — and there are far more dangerous possibilities for those who may have vast troves of personal details individually linked.

The studies compiled by Bode show why:

  • it is vital for the least possible amount of information to be given to any company;

  • there should be limits on how long non-offered data may be stored;

  • anything retained must be encrypted;

  • there must be serious repercussions for failing to adequately secure data, even if no breach has occurred; and,

  • individualized data must generally be devalued.