When Mossack Fonseca’s entire corporate memory spilled onto the Internet for everyone to see in April following the release of the Panama Papers, it provided a stark example of the risks and potential repercussions associated with poor dark data management. Once journalists began piecing together the information, they uncovered evidence and patterns of questionable conduct, which resulted in a number of serious consequences, including the resignation of Iceland’s prime minister.
People may assume that when data loses its context or becomes obsolete for its intended purpose, its value goes away. But any data with a timestamp potentially has value, according to Jim Hunter, chief scientist and technology evangelist at Greenwave Systems.
"Context can be construed from all these seemingly unrelated sources as there is so much information to draw from and more is being collected every day," said Hunter. "The ability to tie data to a given person or group or company is there, and that makes it attractive to malefactors."
"When companies that fail to properly defend their online data suffer a breach, it’s often the released dark data that brings the most interesting stories to light," said Orlando Scott-Cowley, Mimecast’s cybersecurity strategist.
A growing enterprise concern is how to best manage and protect the information assets organizations collect, process and store but generally fail to use for other purposes, commonly referred to as dark data.
As companies collect more data—from marketing information, web logs, customer information, application telemetry, physical sensor data and customer support metrics—they often find themselves saddled with vast accumulations of unstructured, unanalyzed and often improperly secured dark data.
"While companies increasingly embrace data-driven strategies and decisions, the ability to analyze and use all the data is lagging behind the collection of data," said Bill Ho, CEO of Biscom. "Dark data are like closets full of clothes and shoes that will never be worn again. It’s easier to just keep them all."
The lurking risks, and challenges, of dark data
Recent reports estimate that around 80% of Big Data is dark, which poses both internal and external risks.
"Internally, dark data can pose compliance and regulatory threats by storing sensitive patient or personally identifying information for healthcare, sensitive payment card industry data or other data and metadata that can fall outside of the compliance framework of the enterprise and open the company up to financial and legal liability," said Brad Anderson, vice president of Big Data Informatics at Liaison Technologies.
From an external perspective, dark data exploited by unwanted intruders can give a detailed picture of a company’s secrets.
Dark data risks are increased by how what can’t be seen can’t be controlled. Content that goes unmanaged, likewise, is difficult to monitor.
"The inability to monitor usually amounts to the inability to notice when information has been replicated, leaked, tampered with, lost or stolen," said Farid Vij, lead information governance specialist at ZL Technologies.
Reducing the inherent risks
Knowing that data is valuable and needs to be protected is the first step to securing it properly. Hunter is part of the Internet of Things Consortium, which recently formed a working group to address this issue specifically for IoT use cases.
"We need to help all the stakeholders who are collecting, transmitting, or generating IoT data," Hunter said. "They need to understand their rights and responsibilities, and we have to be serious about establishing and maintaining good best practices regarding data past, present and future, because it’s only going to continue accumulating from here."
Once a company has mapped out its data, it should ensure the data are audited and apply change control, access logs and role-based access control, suggests Scott-Cowley. Then, ensure it’s stored in a designated place, not in random places.
The potential value of dark data
It’s not all bad news when it comes to dark data. Current advances in unstructured data and text analytics means that dark data increasingly has potential as a business asset.
"Dark data has virtually limitless value," said Anderson. "A 360-degree view of the customer from many different data sources can make offers more relevant and personalized, thus driving revenue."
For example, organizations can capitalize on dark data through fraud detection to catch malicious behavior and prevent deeper losses, Anderson explained, such as when Costco experienced a tainted fruit recall and was able to connect with every single person who bought that fruit within 24 hours, building loyalty and ultimately increased revenue.