CrowdStrike Fallout crowdstrike.com

CrowdStrike:

On July 19, 2024 at 04:09 UTC, as part of ongoing operations, CrowdStrike released a sensor configuration update to Windows systems. Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform. This configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.

Pradeep Viswanathan, Neowin:

It turns out that similar problems have been occurring for months without much awareness, despite the fact that many may view this as an isolated incident. Users of Debian and Rocky Linux also experienced significant disruptions as a result of CrowdStrike updates, raising serious concerns about the company’s software update and testing procedures. These occurrences highlight potential risks for customers who rely on their products daily.

Rajesh Kumar Singh and David Shepardson, Reuters:

Delta Air Lines CEO Ed Bastian on Monday said it will take the U.S. carrier another couple of days before its operations recover from a global cyber outage that snarled flights around the world.

The Atlanta-based carrier has been hit hard by the outage. It has canceled over 4,000 flights since Friday, stranding thousands of customers across the country. By contrast, disruptions at other major U.S. carriers had largely subsided.

If one has a general worldview for technology today, they can find it in some analysis of this CrowdStrike failure. This saga has everything. For those who think this reinforces the safety of restrictive software policies, that is one possible explanation. Or for one who may be a permanent asshole and thinks diversity initiatives and “woke” programmers are to blame, they are both insufferable and wrong. For those who think marketplace concentration has a role to play — I am one — there is someone who agrees. And for those who want to blame the E.U., the Wall Street Journal has that angle covered.

One comment I found particularly insulting, however, was a line in Microsoft’s response: “We currently estimate that CrowdStrike’s update affected 8.5 million Windows devices, or less than one percent of all Windows machines”. I get why Microsoft would want to reframe this issue; the words “Microsoft outage” are in headlines instead of “CrowdStrike bug”.1 But this does not minimize the impact of this bug — which Microsoft’s statement acknowledges in the very next sentence — and it does not disprove claims about concentrated market share. It is used by nearly 60% of Fortune 500 companies including, it says, 80% of the top automotive, financial, food and beverage, and technology companies. It may not have an influential position compared to the Windows install base at large, but who cares? That is not an appropriate metric for this specific software.

Blaming Microsoft’s agreement with the E.U. also seems unnecessarily reductive, though it had a role to play. If Microsoft could have restricted kernel access in the way Apple does, it is much less likely this precise catastrophic failure by a third-party company would have occurred. But it is beside the point. It seems it could have done so at any time if it did not unfairly give its own security products elevated access. Left unexplored is why it has not done so. Also, CrowdStrike was singing the praises of Apple’s approach when kernel extensions were deprecated in MacOS. CrowdStrike loved the idea of “[r]educing the need for privileged access” which “is always a more secure approach”. Why did it fail to do so on Windows? Even without the same kind of mandate as MacOS, it seems there has been ample opportunity all around to increase protections.

Though I was oversimplifying when I wrote “just a handful of vendors are entrusted with the digital infrastructure on which our society runs” — it is more than a handful, but it is a relative handful in most any category — I maintain at least part of my original argument as written:

Even if there are serious financial and reputational consequences for these failures, the world is still no closer to being freed of its dependence on Windows or Ticketmaster or Snowflake or CrowdStrike. These seem to be incredibly fragile systems on which society rests with little accountability for their makers.

CrowdStrike’s stock is down 23%, which is far more than AT&T suffered. But I am not comfortable with investors’ bad vibes as an accountability mechanism. There needs to be legal structures so that our world’s vast interconnected technological infrastructure has resilience as a rule.

Update: Marcus Hutchins:

Everyone keeps citing that Dave’s Garage video where he claims Microsoft had some super secret fix everything security API that they were going to release but the EU stopped them.

[…]

Microsoft has released multiple APIs that match said description and the EU hasn’t stopped them.

Hutchins also says the 2006 case cited by Thompson about E.U.-related concessions is not the one Microsoft is referring to in this circumstance.


  1. In that particular article, CrowdStrike is not mentioned until the final paragraph. ↥︎