Is there a correlation between the US crude oil import from Norway and the numbers of drivers killed in collision with a railway train? Or is there a correlation between the number of Math doctorates awarded and the amount of Uranium stored at US nuclear power plants? Nobody would say there is, but the opposite is actually true.
The website www.tylervigen.com shows spurious correlations; correlations that are there, but the two subjects have obviously nothing to do with each other. The website claims that it can show 30,000 of those spurious correlations. This can form a problem for Machine Learning.
Machine Learning is a specific part of Artificial Intelligence. It is defined as a field of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Brynjolfsson and McAfee define two types of machine learning: deep learning and reinforcement learning. With deep learning uses the computer large datasets of examples of the correct answer in particular problems. This gives the machine mapping from an input (x) and an output (y). For example: (x)=pictures of various animals and (y)= the name of this animal. The algorithms use big datasets to learn itself the correct answers. With reinforcement learning, the system specifies he current state of the system and the goal, lists allowable actions and describes the elements of the environment that constrain the outcomes for each of those actions. The system has to figure out how to get as close to the goal, given the allowable actions.
The big problem of Machine Learning is that it seeks statistical correlations between subjects in order to provide new information to people. As the example in the introduction shows, there are many spurious correlations that obviously have nothing to do with each other. Since the underlying structure is so complex, people can’t see it when the Machine Learning systems makes errors. This can cause spurious correlations that are less obvious. Because humans can’t recognize the spurious correlation, the Machine Learning system keeps making mistakes, as it bases its algorithms also on the wrong outcomes.
References:
Brynjolfsson, Erik and McAfee, Andrew ‘The business of artificial intelligence’ (2017)
Samuel, Arthur ‘Some studies in Machine Learning using the game of checkers’ (1959) IBM Journal of Research and Development