Synthetic Data – Substitute for real-world data?

6

October

2019

5/5 (4)

The statement “Data is the new oil”, coined by Clive Humby, grasped the importance of data for modern technology and society already in 2006. Just as there are problems with the use of oil to the environment, there are also risks of data usage. These risks are generally about privacy protection and the question about who owns the data and is it used in a way the user intended it to be used. Governments or political unions like the EU try to protect personal data, which is often raised by companies in the course of the use of one of their services.

However, with stricter regulations or anonymization of personal data there is no guaranteed protection of data misuse if it gets into the wrong hands. One way to handle these risks and to confirm with legal regulations is the usage of synthetic data.

“Synthetic data is information that is artificially manufactured rather than generated by real-world events” (Myers, 2019). According to Garg (2019) the process can either be to use a model to describe a real-world behavior or to use a real-world distribution to generate synthetic data. Thus, synthetic data is anonymous and can be applied without risks. With these realistic and person-independent data, the range of possibilities is extended by simplifying the processing, analysis and exchange of large amounts of data. Compared to existing anonymization methods, synthetic data enables the full depth of detail and therefore delivers a higher value. The generation of data bears additional advantages like it can meet specific needs, conditions or can even be created for not yet existing events, e.g. for clinical or scientific trials.

The need and handling of big data in times of machine learning increases significantly thus the trade with data and third-party data rises. Synthetic data could decrease the trade of third-party data if companies are able to generate data sets themselves. According to the “Gartner Hype Cycle for Emerging technologies, 2019” synthetic data is a rising trend. With its benefits it will offer new ways for data driven technologies and may also protect the privacy of personal data in the future, due to alternative substitutes. This can be a chance for new developments and breakthroughs.

After all, synthetic data and its generation is still in an early stage and needs further research and technological development. In particular, at this time synthetic data needs to be evaluated against real-world data occasionally. Furthermore, if it is used in areas such as AI or clinical studies in healthcare, validation and traceability of the data is necessary, also the origin of used datasets. In summary, it can be said that there are still hurdles which have to be overcome before synthetic data will replace personal data in most of its applications. To encourage and boost necessary improvements the National Institute of Standard and Technology, a non-regulatory agency of the United States Department of Commerce, launched a challenge in 2018/2019 to improve synthetic data generation tools (total price purse: 150.000$).

Do you think data can be generated in the future and therefore be a substitute for real-world data?

References:
AI Multiple (2019). Synthetic Data: An Introduction & 10 Tools. [Online] Available at: https://blog.aimultiple.com/synthetic-data/ [Accessed: 02.10.2019].

Myers, A. (2019). Deepfakes: What’s real with synthetic data? [Online] Available at: https://medium.com/memory-leak/deepfakes-whats-real-with-synthetic-data-5c8348b041d2 [Accessed: 02.10.2019].

Garg, A. (2018). The Power and Challenges of Synthetic Data – 3 Principles. [Online] Available at: https://medium.com/@amitgarg/the-power-and-challenges-of-synthetic-data-3-principles-c254e25fc6d5 [Accessed: 02.10.2019].

Please rate this

Digital healthcare – boon or bane

10

September

2019

5/5 (3)

Digitalization moves forward every day.
Among others the topic of digitalization of the human healthcare gains more and more attention.
Hereby the questions arise what can be done to provide an improved healthcare supply for patients, a better working environment for physicians and a single source of truth for patients and physicians. But what pros and cons come with this possible change of a digital healthcare system? And which one overweight the other?

In terms of improved service, a more digitized healthcare system could provide office hours for virtual visits to consolidate a physician in person and thus could provide medical service even in rural and remote areas in the future.
Sharing of personal data and information could improve the information process between different experts, especially in the case of an emergency. Limited access and a resulting ineffective way of communication among medical stuff is often a cause for mistreatment or even no treatment at all. Thus, verified applications for topics like telemedicine, patient monitoring, medical services, and emergency response could increase the reliability of critical information and the service for patients.

But to achieve these improvements and possibilities there are several challenges that need to be faced. The main objection is similar like in most cases of digitalization, the protection and security of personal data. But also issues like data storage, management and the necessary availability of heterogeneous resources with unified, but still restricted, and ubiquitous access need to be dealt with. There is a need for governmental regulations to arrange a sustainable dealing with confidential data.

To summarize, the digitalization of the healthcare system would improve the quality, safety and efficiency of patient care and treatment. It could be even said that it is inevitable for the further development and the use of technology like Artificial Intelligence (AI) and Machine Learning (ML), for example to supply models of prediction, where thousands of data is necessary to provide these services. Data which can be collected in an efficient way if governmental regulations and appropriate patient consent are in place to use data for their well-being.

Sources:
Deloitte. (2019). Virtual health care: Can the health care system deliver? [Online] Available at: https://www2.deloitte.com/us/en/insights/multimedia/videos/virtual-health-care.html [Accessed: 10.09.2019].

Doukas, C., Pliakas, T. & Maglogiannis, I. (2010). Mobile Healthcare Information Management utilizing Cloud Computing and Android OS. In Annual International Conference of the IEEE Engineering in Medicine and Biology.

Jasmin, C. (2018). mHealth: What is it, and how can it help us? [Online] Available at: https://www.medicalnewstoday.com/articles/322865.php [Accessed: 10.09.2019].
Tresp, V. et al. (2016) Going digital: a survey on digitalization and large-scale data analytics in healthcare. In Proceedings of the IEEE, 104(11).

Please rate this