A faulty software update from cybersecurity provider CrowdStrike bricked millions of computers Friday, plunging global businesses using Microsoft systems into chaos.
Though Microsoft and CrowdStrike issued remediation strategies, recovery was still ongoing Monday for many businesses, including major airlines and hospital systems. For CIOs, the scenes of business disruption highlighted the importance of having plans in place for when an outage hits.
The CrowdStrike outage shone a bright light on the limitations in IT organizations, according to Jon Amato, senior director analyst at Gartner.
"This may be the biggest stress test that I've ever seen for direct, first-line IT support teams," Amato said Friday.
CIOs can take the CrowdStrike outage as an opportunity to reassess their company's preparedness against major outages. Testing out crisis scenarios and developing business continuity plans are some of the tools available to tech chiefs before the next major IT outage strikes.
"This has happened before, and we can expect something like this to happen again in the future," said Frank Trovato, principal advisor at Info-Tech Research Group.
Software monoliths
The scale of Friday's outage could be attributed to the number of enterprise machines running Windows and CrowdStrike's endpoint protection platform, Cybersecurity Dive reported. Customers using the Linux or Mac versions of the update were unaffected.
The disruption also put the focus on potential single points of failure in the IT ecosystem, according to Spencer Kimball, CEO at Cockroach Labs.
"The fact that something like this could happen should open people's minds to the real risks of technical monocultures," Kimball told CIO Dive.
However, reliance on single software providers is part of the reality in modern IT estate management." If you commit to using Azure as your primary cloud environment and the various services they provide, you are vulnerable to an Azure outage," Trovato said.
Plan and prepare
Backup plans can give businesses a way to hedge against the cost of unforeseen outages
Organizations lose an estimated $400 billion per year due to IT failures and unplanned downtime, a Splunk report published in June found. Though cybersecurity issues were the most common factor, infrastructure and software issues are the second-most common driver of outages.
CIOs hoping to avoid a crisis during the next major software outage will look more closely at software quality reassurances from their main software providers, according to Amato. Analysts told CIO Dive business leaders must resist falling back into business-as-usual to improve their standing.
"That really has to be part of the culture, and should be ingrained into purchasing and operational processes," Amato said.
Though CrowdStrike said it is taking steps to prevent a similar issue from happening again, competitors in the endpoint detection and response space stand to benefit from the reputational impact of the outage on the vendor, analysts said.
The vast majority of IT leaders say outages or disruptions degraded customer trust in their organizations, according to PagerDuty research.
As businesses recover and the urgency from the initial outage subsides, CIOs can increase preparedness by running simulation exercises on what the next IT crisis could look like — and develop an effective response.
"What you want to identify first is: when we imagine all the different things that can go wrong, which one is going to be the most disastrous for us," said Kimball. "Then you want to cross it with which is most likely."
Correction: An earlier version of this story misattributed a quote to Frank Trovato instead of Spencer Kimball. That has been updated.