Dive Brief:
- Most organizations failed to align their cloud provisioning with compute needs last year, according to a Cast AI report published Wednesday. The Kubernetes automation platform provider analyzed AWS, Azure and Google Cloud workloads across 2,100 organizations during 2024.
- Overprovisioning was ubiquitous among the Kubernetes workloads analyzed. Almost every cluster was underutilized, with organizations using an average of just 10% of cloud CPU capacity and less than one-quarter of memory capacity provisioned over the 12-month period.
- Wasted spend and procurement overruns are endemic across public cloud estates, Laurent Gil, Cast AI president and cofounder, said in an email. “The patterns you see in Kubernetes —overprovisioned nodes, underutilized CPU and memory — are symptomatic of broader cloud inefficiencies.”
Dive Insight:
Unruly cloud costs remain a perennial pain point for budget-conscious CIOs. Bills rose even as competition among the three largest providers helped rein in unit costs last year.
As AWS, Microsoft and Google Cloud battle for enterprise workloads, consumption drove 22% year-over-year growth in the $330 billion global market for cloud infrastructure services in 2024, according to Synergy Research Group.
Savings instruments, like tiered pricing, spot-instance discounts and committed spend incentives, can be a double-edged sword, cutting costs up front on cloud resources that go unused.
Procurement teams typically err on the side of caution to avoid shortfalls and take advantage of pay-in-advance vendor discounts. AWS, Microsoft and Google Cloud have commitment-based spending plans that can cut up to 75% off the on-demand list price, according to the FinOps Foundation.
“Overprovisioning becomes the default response to uncertainty about capacity needs,” Gil said. “No one wants to risk service disruption because a node was undersized, so organizations often allocate more CPU and memory than necessary, driving up costs.”
To compound the problem, containerized applications with an excess CPU capacity can still run short on memory, Cast AI found. Almost 6% of workloads analyzed exceeded requested memory at least once over a 24-hour period, leading to service disruptions.
Risks associated with degraded user experiences are an additional reason customers would rather exceed capacity needs than provision “just enough” compute to run an application, Gil said.
For enterprises that can take advantage of spot-instance discounts on available hyperscaler infrastructure, the savings can be significant. AWS sells unused EC2 capacity at up to a 90% discount, but prices fluctuated an average of 197 times per month, according to Cast AI. Kubernetes workloads that use a mix of on-demand and spot-instance compute can reduce costs by more than half, the research found.
The optimization stakes are growing as enterprises fire up AI workloads. Azure customers can reduce cloud GPU costs by 90% on average through Microsoft’s spot instance discounts, according to Cast AI’s analysis. AWS and Google Cloud spot instances yielded substantial yet somewhat lower average savings of 67% and 66%, respectively.
Moving workloads to specific regions and availability zones based on variable rates can also be an effective savings strategy, reducing costs by up to a factor of six, the company said.