Dive Brief:
- Cloudflare activated its AI-optimized inferencing network across more than 150 cities globally, the cloud connectivity provider announced Tuesday.
- In addition to making the GPU-infused Workers AI developer platform generally available, Cloudflare opened a serverless cloud integration with LLM builder Hugging Face’s multimodel deployment hub, the company said.
- As enterprises ramped up generative AI projects last year, Cloudflare engineers raced to deploy “suitcases full of GPUs” throughout its edge network. The company targeted inference workloads that customize pre-trained models using domain-specific data, Cloudflare CEO Matthew Prince told CIO Dive in December.
Dive Insight:
Cloudflare's focus on network performance and security aligns with two key enterprise AI objectives: keeping data safe and pushing models out to where the data resides.
Companies are eager to experiment with potential use cases, even if it means hitting a few walls in the process. As long as enterprise data remains secure, failing fast can help companies hone in on high-value applications, as Shawna Cartwright, business information officer and SVP of enterprise technology at Cushman & Wakefield, detailed during a CIO Dive panel last month.
“The recent generative AI boom has companies across industries investing massive amounts of time and money into AI,” Prince said in the Tuesday announcement. “Some of it will work, but the real challenge of AI is that the demo is easy but putting it into production is incredibly hard.”
Cloudflare’s solution is designed to ease developer access to Hugging Face’s library of open source models. The Workers AI solution is also optimized for fine tuning large models on smaller data sets, according to the company.
“Organizations and their developers need to be able to experiment and iterate quickly and affordably, without having to set up, manage or maintain GPUs or infrastructure,” the company said.
Cloudflare is banking on filling the gap between massive hyperscaler data centers and the end user.
“We're Goldilocks in that space,” Prince said, during a February earnings call. “The centralized public clouds are too far away and your device that you're holding in your hand or wearing on your wrist doesn't have enough power,”
“The ability to not just tune models but tune them locally while still having the power of beefy GPUs that can then run the inference tasks — that's a really killer combination,” Prince added.