Is there a correlation between the US crude oil import from Norway and the numbers of drivers killed in collision with a railway train? Or is there a correlation between the number of Math doctorates awarded and the amount of Uranium stored at US nuclear power plants? Nobody would say there is, but the opposite is actually true.
The website www.tylervigen.com shows spurious correlations; correlations that are there, but the two subjects have obviously nothing to do with each other. The website claims that it can show 30,000 of those spurious correlations. This can form a problem for Machine Learning.
Machine Learning is a specific part of Artificial Intelligence. It is defined as a field of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Brynjolfsson and McAfee define two types of machine learning: deep learning and reinforcement learning. With deep learning uses the computer large datasets of examples of the correct answer in particular problems. This gives the machine mapping from an input (x) and an output (y). For example: (x)=pictures of various animals and (y)= the name of this animal. The algorithms use big datasets to learn itself the correct answers. With reinforcement learning, the system specifies he current state of the system and the goal, lists allowable actions and describes the elements of the environment that constrain the outcomes for each of those actions. The system has to figure out how to get as close to the goal, given the allowable actions.
The big problem of Machine Learning is that it seeks statistical correlations between subjects in order to provide new information to people. As the example in the introduction shows, there are many spurious correlations that obviously have nothing to do with each other. Since the underlying structure is so complex, people can’t see it when the Machine Learning systems makes errors. This can cause spurious correlations that are less obvious. Because humans can’t recognize the spurious correlation, the Machine Learning system keeps making mistakes, as it bases its algorithms also on the wrong outcomes.
References:
Brynjolfsson, Erik and McAfee, Andrew ‘The business of artificial intelligence’ (2017)
Samuel, Arthur ‘Some studies in Machine Learning using the game of checkers’ (1959) IBM Journal of Research and Development
Thanks for the interesting article!
So, in essence, what you are trying to convey here is that the results that are produced by machine learning can be quite misleading. This is not only the case with spurious correlations as machine learning also produces biased results due to biased data training sets. For instance, it was recently discovered that LinkedIn was not displaying high-paying job ads to women as often as it did to men (because of the data training set and the way the algorithm was written). Therefore, machine learning only reinforces biases that are present in current society. Since this technology is already to a certain extent employed in decision making processes (such as during recruitment processes), it can have a large impact on how decisions are made. Do you think that the advantages of machine learning (such as convenience and extra insight) outweigh these aforementioned disadvantages of producing biased results?
Hello, in response to Gabi, I agree with you that this is a big problem of Machine Learning, however, since it is a young discipline there is still a lot of room for improvement. The first step is to identify the problems (so for example the bias that you mentioned here) and the next step is to think of ways to overcome these problems, making machine learning even better!
Hey Thomas,
Really interesting article and definitely something to keep in mind as we continue to talk about the benefits Machine Learning brings to Business Strategy, its important to acknowledge its shortcomings. I’m curious what kind of implications you think this would have on society as Machine Learning continues to be applied to business’s strategies. And what do you think would be good ways to counteract these influences? Is there a way to teach it to overlook certain spurious correlations?