In the early morning hours of July 19, HP detected an IT anomaly popping up on PCs halfway around the world from its California headquarters. An automatic update in CrowdStrike’s Falcon sensor software had begun disabling Windows devices, triggering the dreaded blue screen of death.
“One of my command centers in India saw it hitting Japan while everyone else was asleep, saw the extra blue screens coming up and isolated it to the CrowdStrike update,” John Gordon, president and GM of HP Managed Solutions, told CIO Dive.
HP technicians quickly wrote and delivered a patch designed to boot before the Falcon update and override it on a fleet of 79,000 PCs.
“By the time people woke up in the U.S., the patch was in,” Gordon said. “The only people who got in trouble were people who never reboot their laptops, but we were able to save 86% of our PCs.”
The crisis knocked millions of workstations offline globally and brought several major airlines to their knees. As flight cancellations climbed into the thousands over the weekend following July 19, United Airlines sent teams of technologists to manually reboot more than 26,000 computers and networked devices at hundreds of airports.
Delta, in contrast, struggled to restore operations for the better part of the week, blaming CrowdStrike for $500 million in estimated losses. Microsoft and CrowdStrike pushed back, pointing the finger at Delta over outdated IT infrastructure.
The event served as proof-of-concept for HP’s PC fleet-management platform, Workforce Experience Platform, a dashboard solution designed to “proactively remediate hardware failures” announced as Workforce Central last year. The company added advanced monitoring and management capabilities to the service last month.
“My team not only does this work for our customers,” Gordon said. “We are the ones who run it for HP.”
Automation consternation
The CrowdStrike incident highlighted the potential perils of relying on vendor-managed automated updates. For the impacted airlines, it also underlined challenges inherent in mobilizing staff to restore IT assets across a geographically dispersed operation on short notice.
“A lot of CIOs and CISOs realized that they forgot to plan for Murphy’s Law,” Paddy Harrington, senior analyst at Forrester, told CIO Dive. “Now a lot of CIOs and their peers on the CISO side are trying to figure out how they can manage this better with users spread out around the globe like no time before in history.”
Talent shortages are a perennial headache for enterprises grappling with IT system complexity.
Enterprises earmark more than one-third of IT budgets for technical personnel, according to Forrester’s 2024 Global Outsourcing Benchmarks report, published in August. The same organizations devote more than one-quarter of those budgets to filling skill gaps with outsourced infrastructure, application and IT-desk support services, the analyst firm found.
“Handling your basic networking, data management, email, databases and infrastructure is challenging,” Harrington said. “They can't keep adding people because they don't have money or they can't find the right people.”
Nearly two-thirds of enterprises that lean on third-party providers for infrastructure needs partially or fully outsource service desk, end-user computing and equipment support, according to Forrester research.
“Very few businesses are going to differentiate themselves competitively in their market space based on how they run their fleet of PCs,” John Annand, research director at Info-Tech Research, told CIO Dive. “That sort of IT toil has always been ripe for outsourcing.”
The cost of failure
Managed PC, laptop and device services, like HP's solution, are often a tough sell to budget-conscious boards, but outage costs and reputational damage are a bitter pill, too.
IT failures cost $400 billion annually, according to an Oxford Economics survey of 2,000 technology, finance and marketing executives commissioned by Splunk earlier this year.
“You get the soft costs of productivity loss,” Gordon said. “But you also have the hard costs from the help desk, which a lot of companies already outsource, and from the output of those costs, like salespeople canceling meetings and calls with reporters that wouldn’t have happened otherwise.”
The CrowdStrike event put a finer point on the price of failure, according to Harrington.
“The entire board is now interested, especially in industries that were affected,” Harrington said. “These companies were down 12 to 72 hours and people got pissed they couldn’t work. So, it’s not just a CIO issue anymore.”
The price tag for Delta’s CrowdStrike outage, which lasted five days and led to the cancellation of 7,000 flights, was $380 million in just one quarter, roughly twice the $200 million average enterprises spend each year on such events.
Southwest Airlines suffered an estimated $725 million in lost revenue and an additional $140 million in civil fines when a winter storm took out its crew rescheduling systems during the December 2022 holiday travel period.
HP defused a more common IT snafu recently when the WiFi network at its Barcelona office went down during off hours, according to Gordon. “We saw the printers were still online and figured out it was a WiFi problem,” Gordon said. “We had the office back up before anyone even woke up to notice.”
Desktop monitoring tools and dashboards have improved with time, as have automation capabilities, Annand said. Printers and PCs have become easier to automate but no less critical to the business.
“The way you take care of the printers and desktops won’t be seen as a strategic asset,” Annand said. “But the smooth operation of those devices is absolutely critical to every other business success criteria.”
Correction: This story was updated to provide the current name of HP's fleet-management platform and more detail about the product's recent updates.