With cloud consumption growing every year, CIO concerns around reliability, availability and cost are also on the rise.
To help answer some of these worries, the Uptime Institute publishes a yearly report on IT data center outages, including offerings from public, managed hosted, colocation, and telecom cloud providers. On the surface, the cloud landscape is stable. But four in five respondents to the Uptime Institute’s report on IT data center outages say they experienced an outage in 2021 or so far in 2022.
Historically, between 70% and 80% of survey respondents have outages and this year's number is the highest ever recorded. Of those outages, one in five were categorized as serious or severe.
Focusing on a single metric such as uptime to determine cloud reliability, masks some of the bigger issues at play, said Andy Lawrence, founding member and executive director of Uptime Institute Intelligence, the Uptime Institute’s research division.
“It's a bit complicated,” he said. “The frequency [of outages] is going to go up because the overall number of data centers and IT services will go up [but] ... the amount of IT being built out every year exceeds the rate of growth of outages."
"So, overall, we are becoming more reliable,” Lawrence said.
If this trend holds, the number of customers experiencing significant and severe outages should come down over time, the report said.
The problem that lurks behind the numbers is, as more critical workloads are moved to the cloud, outages will become more impactful to business operations for longer periods of time — and cost more money to remediate.
“The evidence suggests that the disruption and costs of outages is, in fact, increasing,” the report said.
This is because the workloads that are being shifted to the cloud are customer-facing, said Brent Ellis, a senior analyst at Forrester.
“The biggest risk actually is with enterprise SaaS,” he said. “When you give control over [a service] to a SaaS platform provider, you're also giving over the ability to restore and recover.”
Recovery times are increasing
The amount of time it takes to recover following an outage also has to be factored in when talking about reliability. And that metric is increasing, said Neil Miles a senior product marketing manager for data center automation provider MicroFocus.
“We're not hearing that the cloud services are any more or less reliable than they were,” he said. “But once you sync [to the cloud] it's now an outsized impact ... and what adds to that is discovery.”
Because it’s so easy for developers to spin up new cloud instances, when there is a performance problem or an outage, if the IT team doesn't know what developers are doing they will have a hard time finding and fixing the problem.
In 2021,16% of outages lasted two days or more. In 2017 that number was just 4%.
Another driver of complexity is the hybrid nature of IT itself. More IT is being run on ephemeral infrastructure such as containers and virtual machines in hybrid environments that span multiple clouds and on-premise data centers.
Outages caused by IT systems and network issues highlights the shift from siloed IT apps running on dedicated, on-prem equipment to distributed architectures where more IT functions run on standard hardware that is distributed or replicated across many sites, the report said.
Costs are rising, as well, the report said. In 2019, only 39% of outages cost over $100,000. In 2021, that number had soared to 64% of outages with such a high price tag.
Some industries impacted more than others
When outages do occur, regulated industries like financial services are impacted more than unregulated ones, said Ellis. The reliability question comes into focus when an outage can cost an organization substantial fines for failing to provide services as promised.
In this category of cloud consumer, even a few minutes of downtime per year could be viewed as a serious reliability problem.
When AWS’ US-EAST-1 region went down in December of 2021 due to a scripting error that got out of control, European Union banking regulators stepped in to impose new rules on banks and their use of cloud, Ellis said.
“After the US East outage, the European Union proposed legislation requiring that banks and companies in the financial sector diversify across either multiple clouds or Cloud Plus on prem infrastructure,” he said. “Part of their regulatory response is ... you're going to be out of compliance if you can't show the ability to move that workload to another platform.”