Not only did the system give flesh-and-blood gamers a run for their money, it discovered tricks its own programmers didn’t even know existed, a team from Google-owned research company DeepMind reported in the scientific journal Nature.
“This… is the first time that anyone has built a single general learning system that can learn directly from experience to master a wide range of challenging tasks,” said co-developer Demis Hassabis.
The feat brings us closer to a future run by smart, general-purpose robots which can teach themselves to perform a task, store a “memory” of trial and error and adapt their actions for a better outcome next time.
Such machines may be able to do anything from driving our cars to planning our holidays and conduct scientific research, said the team.
Inspired by the human learning process, the “artificial agent” dubbed deep Q-network (DQN) was let loose, with only minimal programming, on an Atari game console from the 1980s.
“The only information they (the system) get is the pixels (on the screen) and the game score and the goal they’ve been told is to maximise the score,” Hassabis explains in a Nature video.
“Apart from that, they have no idea about what kind of game they are playing, what their controls do, what they’re even controlling in the game.”
Unlike humans, the algorithm-based software starts off without the benefit of previous experience.
Presented with an on-screen paddle and ball, for example, a human gamer would already know that the goal must somehow involve striking the one with the other.
The system, by comparison, learns by activating computer keys randomly until it starts scoring through trial and error.
– Learns, adapts, and gets better –
“The system kind of learns, adapts and gets better and better incrementally until eventually it becomes almost perfect on some of the games,” said Hassabis.
Games included the late 1970’s classic Breakout, in which the player has to break through several layers of bricks at the top of a computer screen with a “ball” bounced off a paddle sliding from side to side at the bottom, Ms Pac-Man which entails gobbling pellets along a maze, pinball, boxing, tennis and a car race called Enduro.
The system outperformed professional human players in many of the games, but fared poorly in some, including Ms Pac-Man.
In particular game types, explained DeepMind colleague Vlad Mnih, “it’s very difficult to get your first points or first rewards, so if the game involves solving a maze then pressing keys randomly will not actually get you any points and then the system has nothing to learn from.”
But it did discover aspects of games that its creators hadn’t even known about. It figured out, for example, that in Breakout the optimal strategy is to dig a tunnel through one side of the wall and send the ball in to bounce behind it, breaking the bricks from the back.
– To the future and beyond –
The creators said their system was further advanced in many ways than Watson, an AI question-answering system that outwitted the most successful human players of quiz game Jeopardy in 2011, and Deep Blue, the computer which beat chess master Garry Kasparov in 1997.
Both these had largely been preprogrammed with their particular abilities.
“Whereas what we’ve done is build algorithms that learn from the ground up, so literally you give them perceptual experience and they learn how to do things directly from that perceptual experience,” Hassabis told journalists.
“The advantage of these types of systems is that they can learn and adapt to unexpected things and also… the programmers or the system designers don’t necessarily have to know the solution themselves in order for the machine to master that task.”
The long-term goal, he added, was to build smart, general-purpose machines.
“We are many decades off from doing that,” said the researcher. “But I do think that this is the first significant rung of the ladder that we’re on.”
The next developmental step will entail tests with 3D video games from the 90s- AFP