Social data science is a (relatively) new discipline that aims to combine the mass information gathered from social sciences with the powerful algorithms and tools utilized in data science. To further the understanding of this new field, a few examples are highlighted throughout this article to not only grasp a better understanding of what currently can be done, but also gain insights on what could be the future of social data science with the increasing amounts of data being generated.
We begin with the example of forecasting elections. Initially, this doesn’t sound too difficult right? That’s because it isn’t. It isn’t when you have a representative sample (that is, asking the people what they are going to vote). This works, but costs money and time. Alongside this, when combined with the fact that response rates are declining, this will only become increasingly problematic as a pure indication. To combat this, researchers from Columbia University (from the department of Political Science and Statistics, respectively) found with proper statistical adjustment, they were able to utilize the non-representative samples to generate election forecasts that were not only faster, but also more accurate than the traditional representative sampling method. They indicate that such methods could further be utilized on a range of economic, social and cultural topics, clearly indicating the powerful conjunction between the fusion of the two disciplines regarding prediction models (Wang et al., 2015).
The second example I want to highlight is the “nowcasting” that can be done with google trends, a direct measurement of social queries (indicating needs on a large scale). By using google trends data, researchers have been able to “nowcast” key statistics such as unemployment, tourism levels in countries and purchase rates for motor vehicles. By analyzing this, the researchers claim a useful short-term economic prediction is able to be made and decisions can be made without having to wait for official (typically delayed) releases by government institutions (Choi and Varian, 2012). These three examples are just the tip of the iceberg regarding what can be achieved by analyzing google trends given the plethora of possible queries and the subsequent “nowcasted” models that can be built on the social data gathered and publicly published.
In our final example, researchers were faced with the issue of collecting data on the current economy in developing countries given the constraints regarding reliability and costs of data collection. To overcome this with social data science, researchers utilized metadata from (the ubiquity of) mobile phones in Rwanda to create a model capable of inferring the socioeconomic status of individuals (Blumenstock, Cadamuro and On, 2015) . If replicated in other developing countries, this approach promises adequate measurements of economic quantities required in market research for a much cheaper price than the traditional methodology for collecting data.
Evidently the fusion of social science and data science (social data science) allows for new approaches to forecasting, nowcasting and revolutionized market research of what was once considered less relevant or difficult to obtain data. These papers are merely the tip of the iceberg regarding what can be done with the current existing datasets but demonstrate a clear path towards what can be done in the future with even more data and computing power. Perhaps that are ethical consequences that could impede on the future of these developments, but one thing is clear to me – an understanding of what it is and how it can be used remains a crucial element to dealing with the potential concealed effects. And this article is a great start for that.
Blumenstock, J., Cadamuro, G. and On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350(6264), pp.1073-1076.
Choi, H. and Varian, H. (2012). Predicting the Present with Google Trends. Economic Record, 88, pp.2-9.
Wang, W., Rothschild, D., Goel, S. and Gelman, A. (2015). Forecasting elections with non-representative polls. International Journal of Forecasting, 31(3), pp.980-991.