AI (Artificial intelligence) surprises us every day. The basic concept of a technology which can be taught is something people still find difficult to comprehend. However, AI is not planning on slowing down and allowing these people to get used to the idea, it is quite the opposite. Last week AI has made a new development which once again indicates how fascinating and human-like AI can be. Researchers created a program that facilitated multi-agent reinforcement learning. This is the concept of placing two algorithms in a competitive environment which ensures emergent behaviour and reinforcement learning. This was done by a game of hide and seek. This seems like a very ordinary game, but it facilitates all the aspects needed to show this behaviour.
Interestingly AI showed 6 strategies over time which were getting more and more developed and none were initially programmed. All these strategies are similar to human development in the way they were organised and their learning sequence.
In the first phase, the hiders and seekers learn strategies and counter-strategies.
In the second phase, the hiders learn to use tools and alter their environment. They build shelters to create better hiding spots.
In the third phase, the seekers learn this alteration as well to enter the hiding spots of the hiders by using ramps.
The fourth phase happens when hiders learn to lock ramps so that the seekers can not use them anymore to enter their shelters.
The most fascinating phase is probably the fifth one. In this phase, seekers use the blocks also present in the game. The ramps can not be moved but the seekers can ‘surf’ on the boxes, therefore, increasing height and ‘surf’ over the walls the ramps has created.
The sixth phase happens when the hiders learn to lock the boxes as well.
All these phases happened over 380 million rounds of training. What makes this case so interesting is that the researchers decided to end this trial after around 500 million rounds. They explained that they initially wanted to end the training around phase 4 as they believed this was the end phase, but then phase 5 and 6 occurred. Therefore, who knows what would have happened if they would have let it run for another 500 million rounds? This once again shows how unexpected AI can be, and how it is still difficult for us to grasp the skills AI is capable of.
Bibliography:
Towardsdatascienece (2019) available at:https://towardsdatascience.com/openai-tried-to-train-ai-agents-to-play-hide-and-seek-but-instead-they-were-shocked-by-what-they-3ea32bf7fc95
Technology review (2019) available at:https://www.technologyreview.com/s/614325/open-ai-algorithms-learned-tool-use-and-cooperation-after-hide-and-seek-games/