Understanding Your IT Dependencies: Unpacking the Crowdstrike Windows Outage

Understanding Your IT Dependencies: Unpacking the Crowdstrike Windows Outage

Happy almost weekend, everybody…or, not, if you’re in IT…or trying to travel…or get medical attention…or just get your work done and start the weekend off with a bang… Many of us have woken up to the news of a massive global outage caused by a Crowdstrike Falcon endpoint sensor update for Windows hosts. From airlines to banking systems, emergency services to media outlets, businesses around the world are dealing with the dreaded Blue Screen of Death (BSOD) to kick their weekend into high gear.

NOTABLY…this is not a cyber attack. As far as we know, malintent is not an issue.

According to the company’s website, the outage was caused by “a defect in a single content update for Windows hosts. Mac and Linux hosts are not affected.” Further, the company says that the issue was “identified, isolated and a fix has been deployed.”

 

Crowdsrike response

 

Good news! Except, according to sources, this isn’t the simple fix it’s being positioned as.

reddit
Source: Reddit: https://www.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_error_in_latest_crowdstrike_update/ 


While attempting to triage the fix, many customers are reporting that they’re stuck in a boot loop and being forced to manually reset impacted servers, which could result in hours — or possibly days — of downtime and uncountable amounts of lost productivity and revenue.

Understanding IT dependencies and the impact on cybersecurity

If this is not a cybersecurity issue — and it does not seem to be — why is a company like OX commenting? Quite simply: because it highlights the criticality of understanding the longtail of dependencies within IT infrastructures. 

Neatsun Ziv, OX Security’s CEO and Co-founder, has said, “Incidents like the one we are seeing cause global chaos today, where an error in an update provided by a provider causes widespread outages, are not uncommon. What is unique about this incident is the scale at which it has taken place, likely wiping billions of dollars from the global economy due to global, widespread downtime.”

What’s become clear in the aftermath is that IT and operations teams are having to boot individual endpoints manually, which will take tons of time, especially for understaffed businesses. If the machine is Bitlockered, response teams will also have to enter a very long passcode, delete the file, and then restart. Remote-first companies will have to walk employees through these steps.

Agent-based systems versus agentless

While the world is recovering, we don’t want to cast stones. It’s easy to say, “An engineer messed up!!” But in reality, sometimes things happen. What we will say is that agent-based tools have consistently caused issues, starting with performance issues and network bandwidth issues.

Illustrated here, deployment and management of agents is problematic at scale; Furthermore, ensuring consistent agent configurations and updates across the entire ecosystem — especially if we’re talking about 100s of thousands, is extremely challenging. 

With the Crowdstrike issue, the remediation requires hands-on-keys to fix. In today’s hybrid and highly mobile work environment, ensuring the right updates in this scenario is near impossible. 

In contrast, agentless deployments offer numerous advantages, especially when it comes to updates. Ziv notes that automated agentless updates facilitate:

  • Centralized Control: Without the need for agents on individual devices, updates can be managed centrally, ensuring consistency and efficiency.
  • Rapid Deployment: New patches or software versions can be pushed out to all endpoints simultaneously, accelerating the update process.
  • Reduced Error Rate: Centralized control minimizes the risk of human error during the update process.
  • Improved Security: By eliminating the need for agents, which can be potential attack vectors, agentless technology enhances security.
  • Scalability: Handles large-scale deployments with ease, as there’s no need to manage agents on countless devices.

 

This is an extremely unfortunate incident and we wish every IT team impacted good luck! What’s important here is to remember that incidents will happen — whether they’re cyber incidents or IT incidents. The best way to mitigate both the likelihood and severity of incidents is careful planning, including threat modeling, testing, backups, and practicing rapid response — and perhaps a future agentless approach.

Group 68754

Get an AppSec Posture Management Assessment

  • Full visibility
  • Focus on what matters
  • Mitigate risk at scale
Get my assessment

Getting started is easy

Bake security into your software pipeline. A single API integration is all you need to get started. No credit card required.