When a faulty CrowdStrike update snaked its way through global IT systems on July 19, companies with global footprints were quickly alerted. But small- and mid-sized businesses found it harder to react to an event that happened just after midnight Eastern time.
“It’s important for companies of any size to periodically think through the significant risks that they run with their IT systems,” said Charles Betz, Forrester VP and research director. For a small company, especially a new one, the wrong kind of IT disruption could “cancel your startup dreams.”
That doesn’t mean CIOs should prepare narrowly another CrowdStrike-like outage. Instead, technology leaders should build resiliency into their IT systems, from how to mitigate an event that could happen at any time of the day, to making sure backups exist in case the entire system goes out.
“The next big thing to make the headline is going to be something completely different,” said Betz. “You can’t predict but you can prepare.”
For businesses based in just one time zone, or one country, having 24/7 coverage in case of a serious outage can pose a challenge, but not an impossible one to solve under the right protocols.
World Insurance Associates, a U.S.-based company with about 40 full-time IT employees, outsourced its Level 1 help desk, or the more basic support requests. “It’s a cost-effective way to have someone respond immediately if there’s something submitted by an end-user,” said Michael Corrigan, CIO of World Insurance Associates.
That issue then follows an escalation path. More serious problems go to Level 2 support with employees available after hours and on weekends. If the issue continues, it goes to Level 3, tapping in a help desk management group that includes himself. Employees have the ability to “hit the big red button where they can escalate to level 3.”
This management team keeps in touch via the Microsoft Teams app on their phones. They also have a third-party SMS tool that allows them to send out mass technical alerts. This could include everyone at the company or just the IT team “if we all need to come together and hop on a call to get our arms around how we want to attack an issue and ultimately resolve it,” Corrigan said.
During the CrowdStrike outage, only about 5% of the company's endpoints were impacted, but the system still alerted Corrigan and other team members via text messages and their team chat, enabling a rapid response.
This structure has been in place for about a year and a half. “We try to make it as fool-proof as possible,” Corrigan said.
Preventing single-issue outages
The CrowdStrike situation was unique, but responses to it shouldn’t be, said Erik Eisen, CEO of CTI Technical Services. How a company will respond to one software outage should be part of every disaster recovery plan.
“If you have one piece of software that glitches and takes down servers and server farms and work stations, now you have a mass problem on your hands,” Eisen said. “That’s not a fire or a flood, but still a disaster.”
Backup tools can help a company restore systems and quickly recover operations. Companies running some or all systems on-prem can pre-configure a cloud-based home for their systems, and then move applications and data to the cloud quickly. They can do so, even temporarily, while the local event is remediated, according to Eisen.
Companies also need to think through their architecture, how applications are linked together, and how a disruption to each one would affect the rest of the enterprise. ”It’s scary to think that one thing can literally bring you to your knees,” Eisen said. “You have to have the right stack in place, and the right backup and disaster recovery tools.”
Preparing for what’s next
While the next major outage is unlikely to follow the CrowdStrike script, disruptions are common. They can be triggered by cyberattacks, software glitches and natural disasters, said Betz.
The after-affects of Hurricane Helene, which hit the Southeast U.S. in September, showed how many things could go wrong. “The modern small to medium enterprise should be more concerned with what happened in Asheville than happened with CrowdStrike, because Asheville represents the broad spectrum of everything that can disrupt your business,” he added.
The North Carolina mountain town was inundated with water as a result of the hurricane.
“I don’t think that another CrowdStrike will happen. I do think there will be more Ashevilles.”
It’s important for companies to think through what they would do in the event of such a disaster, and to get outside help doing so, Betz said.
Third party consultants, even if they only provide support a few times a year, can identify gaps that a small to mid-sized company, especially one with a small IT team, might miss.
“Backups tend to fall to the bottom of the list, and then when you really need them, you’re out of luck.”