CrowdStrike Outage: Faulty Update Exposes Quality Control Gaps, Cripples Global Systems

CrowdStrike Outage: Faulty Update Exposes Quality Control Gaps, Cripples Global Systems

A routine update to CrowdStrike’s widely used cybersecurity software went spectacularly wrong on Friday, causing a global IT outage that paralyzed businesses, airlines and government agencies. Security experts are now pointing to inadequate quality control as a possible cause for the widespread disruption.

The faulty update to CrowdStrike’s Falcon Sensor software, intended to bolster defenses against emerging threats, contained flawed code that caused widespread system crashes on Windows computers. The impact of the outage was felt globally, with banks, airlines, hospitals and even government offices experiencing significant disruptions.

While CrowdStrike quickly released information to fix the affected systems, experts warned that full recovery would be time-consuming and would require manual removal of the flawed code.

“It appears that this file may not have been included in the checking or sandboxing that they do when looking at code, or it may have slipped through the cracks,” said Steve Cobb, Chief Security Officer at Security Scorecard, whose own systems were affected.

The problem became apparent soon after the update was rolled out, with users flooding social media with images of the dreaded ‘blue screen of death’ (BSOD) and associated error messages.

Security researcher Patrick Wardle traced the flaw to a file in the update containing configuration information or signatures used to detect malicious code. He speculated that the frequency of such updates may have led to insufficient testing: “It’s common for security products to update their signatures, say once a day… because they’re constantly checking for new malware and they want to make sure their customers are protected from the latest threats… The frequency of updates ‘is probably why (CrowdStrike) didn’t test it that much,’ he said.”

Experts criticized the lack of a phased rollout of the update. “Ideally, this would have been rolled out to a limited group first,” John Hammond, principal security researcher at Huntress Labs, told Reuters. “That’s a safer approach to avoid a big mess like this.”

This incident underscores the potential for catastrophic consequences when security updates intended to protect systems contain undiscovered flaws. It also highlights the need for robust quality control measures and prudent deployment strategies to prevent similar widespread outages in the future.

Srirang Srikantha, Founder and CEO of Yethi Consulting said: “These outages show how fragile and interconnected our systems are. Companies like MSFT have great practices and the fact that a bug makes it through the process is unfortunate. It highlights the need for good testing practices before new software is released to production systems.”

Sundareshwar K, Partner & Leader – Cybersecurity, PwC India commented, “This is a black swan event that not only impacts businesses but also the nation’s machinery as a whole, and underscores how protecting entities from risk involves much more than just technology… This development highlights how it is a misconception that enhanced technology implementation alone will help organizations become more secure and ensure business continuity. As organizations work to remedy the current situation, the focus should be on rethinking risks and moving beyond the layers, patches, products and tools to build an inherently strong cyber architecture with complementary interventions that ensure resilience in the face of such unforeseen technological setbacks or disruptions.”

Athenian Tech said in a statement: “The recent CrowdStrike Falcon sensor incident highlights significant vulnerabilities and operational risks with automatic security updates, leading to widespread system outages, particularly in enterprise environments. This underscores the need for rigorous testing and controlled deployment strategies for software updates. While CrowdStrike is addressing the issue, this incident highlights the importance of balancing robust security with system stability and adopting best practices for software updates to prevent similar incidents in the future.”

Piyush Goel, Founder and CEO of Beyond Key said: “The complex interactions between CrowdStrike’s update and Microsoft’s infrastructure were likely unforeseen. CrowdStrike quickly identified the bug and rolled back the update, while CERT-In provided guidance to users to remove the problematic file. This incident underscores the need for diverse and well-tested cybersecurity solutions to prevent similar large-scale outages in the future.”

The global impact of this outage is a testament to CrowdStrike’s widespread adoption, with its software used by more than half of the Fortune 500 and numerous government agencies, including the U.S. Cybersecurity and Infrastructure Security Agency (CISA).

Response from CrowdStrike

Below is a summary of what happened, according to CrowdStrike’s official statements:

The timeline:

July 19, 2024, 04:09 UTC: CrowdStrike has released a sensor configuration update for Windows systems as part of its ongoing security efforts.

July 19, 2024, 05:27 UTC: The broken update has been fixed.

The consequences:

Windows systems running Falcon Sensor version 7.11 and later that were online and downloaded the update between 04:09 UTC and 05:27 UTC were susceptible to a system crash.

Systems running Linux or macOS were not affected.

The technical details:

The issue was caused by a faulty update to ‘Channel File 291’, a configuration file that dictates how Falcon Sensor evaluates the execution of named pipes on Windows systems.

The update targeted malicious named pipes used in cyberattacks, but caused a logic error, causing the operating system to crash.

CrowdStrike has since corrected the logic error and updated Channel File 291.

Response from CrowdStrike:

“This issue is not a result of or related to a cyberattack,” CrowdStrike emphasized.

The company quickly implemented a fix and provided remediation recommendations through its blog and Support Portal.

CrowdStrike conducts a thorough root cause analysis to prevent similar incidents in the future.

“We understand how this issue arose and we are conducting a thorough root cause analysis to determine how this logic error occurred. This effort will continue. We are committed to identifying any fundamental or workflow improvements we can make to strengthen our process. We will update our findings in the root cause analysis as the investigation progresses,” the company said.