Enterprises have long grappled with data dilemmas, from access management to security and hygiene. Those challenges are now amplified by the rush to add AI, a technology that hinges on a solid data foundation.
“The more things change, the more they stay the same,” Joe Depa, global chief innovation officer at EY, told CIO Dive. “Data has always been a critical topic for the C-suite … and I think you’re seeing that become even more prevalent now.”
But despite the sustained focus and added urgency, data gaps remain. Nearly two-thirds of organizations lack or don’t know if they have the right data management practices for AI, according to a Gartner survey published in February.
CIOs can help enterprises course-correct before wasted spend and failed pilots accumulate by identifying critical gaps and the best way to bridge them, experts told CIO Dive. But first, businesses should assess existing practices, engage leadership and plot a path forward.
“The challenge is really to understand what type of data is needed and then to be able to make that data available,” said Sorin Hilgen, chief digital officer and in-country CIO at EG America. “I don’t know anyone that’s saying, ‘Yep, we’ve got our data all buttoned up. We’re good to go.’”
EG America is the fifth-largest convenience retailer network in the country by store number and home to several brands including Cumberland Farms and Kwik Shop. The company has worked to accelerate its digital transformation journey, with AI taking a starring role.
To ease data challenges, the company has leaned on vendors, such as Databricks and Quorso.
“AI is just very data hungry,” Hilgen said. “We are making sure that we have accurate data feeds that are going from various systems, internal, external into the area, so that the AI platforms can consume them appropriately.”
The retailer hopes AI can help optimize floor and shelf space in stores and forecast supply needs by connecting to a variety of datasets.
“We’re in early stages,” Hilgen said, but the systems will need to access local events, weather data and demographics, for example. Stores near a Little League baseball tournament would increase their water orders to prepare for the influx, while locations in Jewish neighborhoods would increase their kosher options. If a snowstorm is coming in, a store could increase its milk supply.
Data hygiene, management and access are critical for those use cases. No matter the industry, technology executives are having similar conversations about enhancing their data strategies. Aflac’s EVP and CIO Shelia Anderson said during a March CIO Dive virtual event that it can be more difficult in practice than what’s initially expected.
“When you start seeing the quality of your data — or the lack thereof — that may back you up a bit, so that you have to spend a little bit of time with data quality to ensure that you're able to get the results that you need,” Anderson said.
What organizations get wrong
Even with the best intentions, technology leaders and their enterprises often go about fortifying data strategies in the wrong way, according to analysts.
Organizations thinking about AI-ready data typically prioritize three criteria: governance, quality and performance, Gartner research found. “I’m not going to say it’s not correct, but the last two criteria that came to mind for them were lineage and data diversity, and those two are critical for AI,” Roxane Edjlali, senior director analyst at Gartner, said. Data that isn’t diverse will lead to bias in outputs.
“Imagine you’re training a model for reviewing resumes,” Edjlali said. “If you only pick resumes that were submitted by men, you are going to have a data bias if you receive a resume from a woman.”
Understanding data lineage is also critical.
“When we start to look at a lot of the faux pas or the missteps that corporations have made out there … they really could not have a clear understanding around what data model was used to train the AI,” Kristina Podnar, senior policy director at the Data & Trust Alliance, told CIO Dive. “What we really can't risk at this moment is for organizations to not be able to explain where the data came from.”
The Data & Trust Alliance published version 1.0.0 of the Data Provenance Standards, which are grouped by source, provenance and use, in July 2024. The standards, co-developed by 19 different member organizations, including Nike, Walmart, American Express and Pfizer, aim to establish a uniform approach to increasing transparency in datasets and enhancing trust and integrity of data and the AI that uses it. The standards were derived from use cases across 15 different industries and then were synthesized, refined and validated by a team of CTOs, data chiefs and other leaders.
To boost adoption, the consortium launched a technical committee last month in partnership with open source and standards organization OASIS Open. Five technology vendors, including Cisco, IBM and Microsoft, jointly sponsored the launch.
“We got these standards to a baseline … but we need to pour some water into this formula, mix it up and really start baking those muffins,” Podnar said. The group says standardizing provenance protocols and developing tools to automate the validation process will usher in better management.
“That’s going to differentiate this effort from historical data governance efforts because it’s not just about what I can tell you… it’s the how,” Podnar said.
Enterprises are currently contending with a broad misunderstanding about what data readiness actually means. Gartner said organizations that fail to realize the vast differences between AI-ready data requirements and traditional data practices are endangering the success of their AI efforts.
“You cannot build it once and for all,” Edjlali said. “It’s not something you can say, ‘Oh, we’re going to build the data strategy, and once we put it in place and in practice, all of our data is going to be AI ready.’ It doesn’t work that way because it is highly dependent on your AI use case and the AI technique that you use.”
Righting the ship
No matter how off-course an organization’s data practices are today, CIOs can help right the ship with a step-by-step approach.
“Even if you did nothing but started to adopt good practices from here on out, then looked at historical data, prioritized it and grandfathered it in, that’s a good way to shift the ship in the right direction,” Podnar said. “Otherwise, it becomes very operationally cumbersome.”
Organizations with a clear direction will also have an easier time reaching those goals.
Edjlali said a basic AI model needs clean data and few outliers to get predictions right, for example. But if an organization wants to train a model for outlier detection, technologists should include them or else the model will not know what to identify.
While organizations used to worry about simply having enough data, the focus has now shifted to having the right kind of data.
Enterprises can also consider using AI to support better data practices, Depa suggested. EY uses synthetic data, for example, to experiment with data sets while lowering compliance risk.
“It's something that we believe is going to be increasingly important to establishing that data foundation in the future,” Depa said.