Data is the oil of the 21st century says Siemens CEO Joe Kaeser. Recent developments in techniques to analyze large data sets have changed the perceived value of data that companies collect and store. The value of data is on an exponential growth path as technology that is underpinned by machine learning, requires copious data inputs. This also reflects a larger trend in the business industry that has arisen through digital transformation; the shift from intuitive decision making to a more empirical fact-based alternative. Companies are increasingly starting to value their data more and are looking to optimize their decision-making systems. To create such intelligent systems, the machine learning algorithms need large amounts of labelled and structured data. Unfortunately, the efficiency by which companies are able to turn a single data entry into valuable output is very low. In fact, it is estimated by the International Data Corporation that only 10% of data that is collected by firms is used for analysis. So, what is happening with the other 90%?
The 90% is dark data. Dark data is data that has been acquired but cannot be used for analysis or decision making. It is typically unstructured and companies often choose to store it despite the costs and current uselessness in hopes that it may be beneficial in the future. The idea is that the more data accessible to the firm, dark or not, the more future advances in analytics would be able to capitalize. How does data become dark data? In the majority of cases, data becomes dark due to the inability for business systems to process the large amounts of data and make sense of it. An example of this is a facial recognition that can track identify and track users in a store, but does not have the capability to determine and log the facial expressions or interactions of these users. How common is this issue in a business environment? It is estimated that 6.75 septillion megabytes of data goes dark, every day. While firms are eager to source new data inputs, they also need to be conscious of the high inefficiencies in storing data and continue to focus on developing techniques that brings more of this dark data, into the light.
Sources:
https://lucidworks.com/darkdata/