The origin of all this data is still not clear. The initial set I was given adhered to a very consistent format, the set in broader circulation is more varied suggesting they’re possibly from multiple sources. Some people have suggested WhatsApp or Instagram as potential additional sources, but I’ve seen nothing to substantiate those claims.
Facebook are yet to put out a clear position on this. They’ve alluded to a 2019 incident being the root cause, but that doesn’t go far enough to explain the data in circulation. There’s a vacuum of information right now, and that vacuum is being filled with by a lot of speculation.
Facebook published a short press release from Mike Clark regarding this breach:
We believe the data in question was scraped from people’s Facebook profiles by malicious actors using our contact importer prior to September 2019. This feature was designed to help people easily find their friends to connect with on our services using their contact lists.
According to BBC reporter Joe Tidy, there were two large leaks of Facebook data in 2019. Tidy points to a September 2019 article in City A.M. as an example of one, while the other was in April 2019. According to Facebook, this weekend’s release consists of data from neither.
Also, for what it is worth, this was about the same time period during which “hundreds of millions” of Facebook and Instagram users’ passwords were stored in plain text in internal logs for years. These incidents are not connected by anything other than the company’s sloppiness, but it indicates a unique level of deviance. If there is one thing that Facebook is most notable for, it is arguably that its size and ubiquity have granted it a license to be shameless.
According to Anja Karadeglija of the National Post, Facebook never reported this breach to Canadian privacy officials as required. Facebook also said that the unauthorized scraping of user data stopped a month before GDPR regulations took effect so it also did not report this to European authorities. Natasha Lomas at TechCrunch reports that Irish regulators are investigating whether that is true.
Update: Lily Hay Newman, Wired:
Facebook says it did not notify users about the 2019 contact importer exploitation precisely because there are so many troves of semipublic user data — taken from Facebook itself and other companies — out in the world. Additionally, attackers needed to supply phone numbers and manipulate the feature to spit out the corresponding name and other data associated with it for the exploit to work, which Facebook argues means that it did not expose the phone numbers itself. “It is important to understand that malicious actors obtained this data not through hacking our systems but by scraping it from our platform prior to September 2019,” Clark wrote Tuesday. The company aims to draw a distinction between exploiting a weakness in a legitimate feature for mass scraping and finding a flaw in its systems to grab data from its backend. Still, the former is a vulnerability exploitation.
Automatic contact matching remains a glaring privacy and security vulnerability. Facebook built a database of about a quarter of people on Earth and then, for a long time, allowed anyone to associate phone numbers — something that has a predictable format — with names, photos, email addresses, and more.