The following is a guest article from Jack Norris, Senior Vice President, Data & Applications, MapR.
Whether you are a chief data officer (CDO) today or simply tasked with impacting your organization through a better understanding of data, you do not face a simple task, a fact indicated by the short tenure of the average CDO.
The inability of most CDOs to understand and command the value of their data is due to the wrong focus. Many organizations accept the adage that data is their greatest asset and go to great lengths to accumulate data in vast warehouses or lakes.
These data stores typically support a series of queries and analytics to understand and report on what happened. Through this delayed batch, historical process, the value of the data is banished to an indirect informer.
The intent being that various corporate acolytes gain insight and walk down the hall to make some change that will deliver business impact. Though that could happen, queries produce a series of generated reports that are delivered and briefly reviewed, but on a much more frequent basis.
At the end of the day, data has very little impact, and what impact that is made, does not easily move and adjust with changing business events.
For a CDO to truly be transformational in his or her role, a different perspective is required. Data is not an asset in-and-of itself. It is a raw material to be leveraged.
Not to provide further insight into how a business operated and performed, but a raw material to impact business while it is happening. This focus puts the emphasis on data flows not data stores, a focus on streams, not lakes and swamps.
A CDO today needs to focus on three keys to success: Understand the data flows within their organization and across their ecosystem, leverage an emerging data fabric and integrate analytics into operations to improve business outcomes.
1. The importance of data flows
In the 2016 Global Report on Technology and the Economy, Deloitte published a special letter on "Flow and the Big Shift in Business Models." This article stresses that we need to focus less on the size of the data and "more on how we might interact with that data as it's generated."
The article further stresses that understanding the nature and power of data flows will be the primary task in computing over the next decade.
Recently published by Harvard Review Press, The Network Imperative: How to Survive and Grow in the Age of Digital Business Models takes this research further and provides examples of which business models drive the most market value.
The authors segment public companies by business model and show how organizations that form networks and harness data flows have an eight times greater value than traditional asset-based business models.
The question for a CDO becomes, what does it mean to harness and master important data flows within your organization?
A common mistake is to confuse data ingestion or ETL processes with the important data flows required to drive transformational value. It wasn't that long ago, at least in this author's mind, that we had separate networks. A sneakernet referred to the process of moving things manually between computer devices.
Now this sounds laughable that network traffic wasn't easily shared across a building, when we now share network connectivity across the world. The same breakthrough is required for data — a focus on data flows.
And when we refer to data flows, we are not talking about the transit time between separate data siloes, the sneakernet equivalent for data. We are not referring to the transport of different types of data.
We're referring to the ability to support a diversity of application processing and analytics on a common data infrastructure – a "Data Fabric."
2. The emerging data fabric
A data fabric includes a variety of data formats at scale, data-in-motion and data-at-rest. And a data fabric is not limited to a rack, or a building. It stretches from the edge to the cloud greatly simplifying the modernization of IT infrastructures and the use of containers, cloud and IoT.
Given our architectures of the past, data size drove expense. The more compact the data, the smaller the hardware platform that was required. The cost differences for data stored across a PC, a server and a mainframe increased exponentially.
Speed was also enabled by reducing the data size and specialized schemas. So each application or "use" resulted in a specialized data structure. These independent data sources or silos sprouted up across an organization.
Each drove certain activities and required extraction and transformation processes from data sources (transactional systems, third party data providers, etc.).
Instead of a series of overlapping ETL processes proliferating across an organization, picture a data fabric that provides a common data platform to serve the needs of a diverse set of applications, including new applications that leverage data and processing across separate silos. This is especially true when we integrate operational and analytic environments.
3. Injecting intelligence into operations
Increasingly, companies that understand the context and actions of customers, competitors and ecosystem partners and make the most appropriate adjustments the fastest will enjoy the greatest competitive advantages.
This requires removing the batch, historical constraints of analytics with mission-critical data flows and real-time applications that focus on injecting analytics into business functions to impact the business as it is happening.
So as the customer is engaging you are optimizing revenue. As threats are occurring you are minimizing risk. As the business is operating you are improving efficiency and quality. This is supported by an underlying data fabric.
An underlying data fabric provides an enterprise-grade persistence layer for broad data sources including files, tables, streams, videos, sensor data and more. The data fabric supports a converged processing layer for file operations, database functions, data exploration and stream processing.
It also supports automated processing as well as traditional SQL for existing analysis and reporting needs. Additionally, the platform provides open access to enable more sophisticated machine learning and AI as an organization's needs and sophistication evolves.