Differential privacy – A sustainable way of anonymizing data?

5

October

2020

No ratings yet.

Since a lot of blog contributions mention the increase of data collection, data analytics, and the potential threat to privacy, I thought it would make sense to introduce the technique of differential privacy which is currently on the rise in the US. Apart from the US Consensus Bureau, Apple, and Facebook are in the front row of exploring capabilities and potentials of this technique.

 

What does differential privacy mean?
Differential privacy describes a technique to measure the privacy of a crucial data set.

 

Differential privacy in action
In 2020, the US government is facing a big challenge. It needs to collect data on all of the country’s 330 million residents. At the same time, it must ensure to keep all the identities private. By law, the government needs to ensure that the data collected cannot be traced back to any individual within the data set. The data collected by the US government collects is released in statistical tables for academics and policymakers to analyze when conducting research or writing legislation.

To solve the need for privacy, the US Census Bureau presented a technique, to alter the data collected, making it impossible to trace it back to the individual, without changing the overall information provided through the data set. The Census Bureau technique is a mathematical technique, to inject inaccuracies, or ‘noise’, to the data. That way, some of the individuals within the data might get younger or older, change in ethnicity or religious believes, while keeping the total number of individuals in each group (i.e. age/sex/ethnicity) the same. The more noise injected into the data sets, the harder the activity to de-anonymize the individuals.

This mathematical technique is also used by Apple and Facebook, to collect aggregated data without identifying particular users of products and services.

However, this activity also poses some challenges. Injecting too many inaccuracies can render the data useless. A study of the differentially private data set of the 2010 Census showed households that supposedly had 90 people, which cannot be true. However, since the owner of a data set can decide to which level the ‘noise’ should be injected, that challenge shouldn’t pose too much of a problem. Further, the more noise is included, the harder it gets to see correlations between data attributes and specific characteristics of individuals.

If a further analysis of differentially private data sets proves the technique to ensure required privacy, especially for governmentally created data sets, it is likely that other federal agencies or countries will use the methodology as well.

 

 

From my point of view, differential privacy as used for governmentally created data sets seems to a big step towards getting a clearer view about the status quo of a country, thanks to increased privacy and therefore increased trust by residents as well as probably increased participation in the process of data collection.

However, based on the complexity of the technique, to me it seems unlikely, that differential privacy will be used widely within companies (for the moment). Losing the ability to analyze data in detail due to increased privacy for the user and therefore lost correlations within data sets is a payoff I do not think a lot of companies are willing to take. Especially, since a lot of smaller companies are just starting to analyze the data they are collecting.
Right now, research shows that only big multinationals with high R&D budgets are able to sustainably increase privacy through differential privacy without losing too many insights derived from the data collected.

 

What do you think
Can differential privacy be a step in the right direction? Or should governments limit companies in the collection, aggregation, and analysis of data to increase privacy for the customers?

 

Sources:
https://aircloak.com/de/wie-funktioniert-differential-privacy/
https://hci.iwr.uni-heidelberg.de/system/files/private/downloads/182992120/boehme_differential-privacy-report.pdf
https://www.technologyreview.com/10-breakthrough-technologies/2020/#differential-privacy
https://towardsdatascience.com/understanding-differential-privacy-85ce191e198a?gi=9d3ad94ea2e4

Please rate this

Leave a Reply

Your email address will not be published. Required fields are marked *