Personal Records for a Billion Individuals, Collected by People Data Labs and OxyData, Found Freely Available on Web Server troyhunt.com

Vinny Troia of Data Viper, a security firm:

On October 16, 2019 Bob Diachenko and Vinny Troia discovered a wide-open Elasticsearch server containing an unprecedented 4 billion user accounts spanning more than 4 terabytes of data.

A total count of unique people across all data sets reached more than 1.2 billion people, making this one of the largest data leaks from a single source organization in history. The leaked data contained names, email addresses, phone numbers, LinkedIN and Facebook profile information.

What makes this data leak unique is that it contains data sets that appear to originate from 2 different data enrichment companies.

Troy Hunt:

It’s entirely possible that this data came from a PDL subscriber and not PDL themselves. Someone left an Elasticsearch instance wide open and by definition, that’s a breach on their behalf and not PDL’s. Yet it doesn’t change the fact that PDL is indicated as the source in the data itself and it definitely doesn’t change the fact that my data (and probably your data too), is available freely to anyone who wishes to query their API. I signed up for a free API key just to see how much they have on me (they’ll give you 1k free API calls a month) and the result was rather staggering.

[…]

And this is the real problem: regardless of how well these data enrichment companies secure their own system, once they pass the data downstream to customers it’s completely out of their control. My data — almost certainly your data too — is replicated, mishandled and exposed and there’s absolutely nothing we can do about it. Well, almost nothing…

I also signed up for an API key and found records associated with my name and one of my email addresses. Everything in it appears to be scraped from public sources — my name matched outdated LinkedIn data from the time that I thought it was an excellent idea to have a LinkedIn profile, while my email address surfaced a mixed data set.

I am, of course, responsible for putting my information out into the world — if someone can see it, they can copy it. But should they be allowed to store it as long as they like? I deleted my LinkedIn profile years ago, but People Data Labs still has my employment history from there. Furthermore, my email address was not public or visible on any of my social media profiles, but PDL still managed to connect all of them because they used each social media company’s API to scrape user details. I have little recourse in getting rid of PDL’s copy of this information short of contacting them and all other “data enrichment” companies individually to request deletion. That seems entirely wrong.