Nowadays, it is almost impossible to speak negatively about data. The benefits of storing and analyzing big data are revolutionary; it can boost firm performance as companies can know their customers and know their own business processes more accurately. It can, therefore, help companies in their strategic decisions (Wambo, et al., 2017). However, data can only do that as long as the data is stored, analyzed and visualized correctly, and that is not a simple constraint. Many companies are having trouble with getting that constraint out of the way and are struggling to get any useful information out of their data (BI-survey, sd).
All these companies have heard the buzzwords of big data and data-driven decision making and they started grabbing every possibility to harness its power. As a result, many companies have dozens of tools in use to enrich their data. In the short term, this resulted in eye-opening insights and more efficient and effective results. However, as the dependency on data of the company starts to grow, the company will reach a threshold after which the insights of individual tools are not sufficient anymore. Ideally, a company would like to combine all the different insights from the different tools into a “single source of truth”.
Unfortunately, this might be extremely hard, and therefore expensive, due to different reasons as it can result in; (1) duplicated data, as several tools may use the same data input. (2) Inconsistent formats and (3) multiple units and languages, as different tools use different data formats, units, and languages. (4) Incomplete information and (5) inaccurate data, as essential elements can be overlooked easily when integrating all sources, resulting in inaccuracy in the new insights (IFP, 2018). Moreover, when the complete structure is clear, data integration must also be compliant with data privacy, data security, and data discrimination (Marr, 2017).
A possible solution can be “data-virtualization”, which is offered by complete new businesses that have stood up to solve this big data problem. Data virtualization is a data layer that integrates all of a company’s siloed data, manages this data, and delivers it to business users in real-time (Denodo, sd). However, these alternatives have downsides to; (1) they are often not compatible for every tool, and you, therefore, have to combine yet again different tools. As a consequence, (2) you are solving your ‘tool-problem’ with tools, increasing your data complexity and your dependency on tools.
What is your view on this arising problem? Is there a way to prevent it, or must a company place its faith in yet again other companies to organize its data?
Bibliography
BI-survey. (n.d.). Insufficient Skills Are Curbing The Big Data Boom. Retrieved from BI-survey: https://bi-survey.com/challenges-big-data-analytics
Denodo. (n.d.). Data Virtualization; An Overview. Retrieved from Denodo: https://www.denodo.com/en/data-virtualization/overview
IFP. (2018, November 13). 5 Data Quality Problems and their Solutions. Retrieved from Insights For Professionals: https://www.insightsforprofessionals.com/it/servers-and-storage/data-quality-problems-solutions
Marr, B. (2017, June 15). 3 Massive Big Data Problems Everyone Should Know About. Retrieved from Forbes: https://www.forbes.com/sites/bernardmarr/2017/06/15/3-massive-big-data-problems-everyone-should-know-about/
Wambo, S., Gunasekaran, A., Akter, S., Ren, S., Dubey, R., & Childe, S. (2017, January). Big data analytics and firm performance: Effects of dynamic capabilities. Journal of Business Research, 70, 356-365.