Grappling with rising compute costs and model hallucinations, CIOs are turning to small language models to improve enterprise AI strategies.
Large language models gained popularity for their versatility, broad domain knowledge and ability to solve complex, multistep problems. Smaller models offer enterprises a less resource-intensive way to complete specific tasks with tailored expertise.
While LLMs have been more prevalent in enterprises in years past, lightweight models have gained traction and business use is set to rise further this year, analysts told CIO Dive.
“Agents and reasoning models will be the primary topics … but SLMs are an important part of the overall business value discussion,” Gartner Distinguished VP Analyst Arun Chandrasekaran said.
Not everyone's idea of a small AI model is the same. Some define small language models by the number of parameters, often tallying in the tens of millions to low billions compared to the hundreds of billions or trillions for LLMs.
“The sweet spot that I see is anywhere from a billion to 10 billion parameters,” Chandrasekaran said. “At least 50% of enterprises have actively looked at a model in the billion-to-10-billion parameter range for their use cases in the last six to 12 months.”
Others characterize small models less based on size and more focused on the development method, such as via distilling. Forrester predicts this kind of small language model integration will surge by more than 60% this year as enterprises with industry-tailored terminology look to leverage models with specific domain expertise.
Varying definitions can muddy the waters for CIOs and procurement teams.
“There is no arbitrary cutoff size for a small language model, which definitely makes this a very confusing space to navigate,” said Rowan Curran, senior analyst at Forrester.
Vendor options for small AI models are plentiful. Google’s lightweight Gemma family of models is nearly a year old. Microsoft launched a set of smaller models called Phi, with the newest Phi-4 introduced to customers in December. OpenAI’s o3-mini was released at the end of January, following the startup’s launch of GPT-4o mini last summer.
“The same companies that are building large models are also building small models,” Chandrasekaran said.
The enterprise allure
Small AI models, like their larger counterparts, have strengths and limitations.
Industries and enterprises with specialized terminology, such as healthcare or medical device retailers, are prime areas for small models to thrive, Curran said. Around one-third of technology decision-makers prioritize domain-specific generative AI capabilities in purchasing, Forrester found.
Smaller models typically use less compute power, reducing costs for resource strained enterprises. More than one-third of companies have delayed AI projects by three months to half a year due to budget constraints, skills gaps and computing availability, according to a Civo report.
Small models can be cost-effective on-device, on-premise and in the cloud. OpenAI said its GPT-4o mini, for example, is more than 60% cheaper than GPT-3.5 Turbo. Smaller models running on-prem or on a private cloud deployment could also allay security and privacy concerns for CIOs.
Organizations like UNESCO have promoted smaller models as a greener compute alternative.
Despite hopes that generative AI will ultimately propel businesses closer to sustainability targets, enterprises have struggled to rein in the resources needed to support AI initiatives. Google’s greenhouse gas emissions increased alongside compute intensity and technical infrastructure investments last year, up 48% since 2019, according to its annual report.
While advantages exist, smaller models aren’t the best option for all use cases, according to Andy Thurai, VP and principal analyst at Constellation Research.
“It can’t match the wide range of use cases that LLMs can support,” Thurai said in an email. “As adoption continues to increase, I expect large closed-source LLMs, such as OpenAI, large open source LLMs, such as Llama, Ai2 or even DeepSeek, to work alongside smaller models for specific use cases.”
Powering open-ended dialogue or more general natural language processing tasks is also better suited for large language models, analysts said. Smaller models will also struggle to manage a multimodel environment, sometimes falling short of nuanced understanding.
Similar to other technologies, enterprises must work to match smaller models with the correct use case.