In asking AI researcher Juergen Schmidhuber about his thoughts on progress at DeepMind and about the AlphaGo vs Lee Sedol Go tournament – provided some initial comments. I will be updating this post with further interview.
Juergen Schmidhuber: First of all, I am happy about DeepMind’s success, also because the company is heavily influenced by my former students: 2 of DeepMind’s first 4 members and their first PhDs in AI came from my lab, one of them co-founder, one of them first employee. (Other ex-PhD students of mine joined DeepMind later, including a co-author of our first paper on Atari-Go in 2010.)
Go is a board game where the Markov assumption holds: in principle, the current input (the board state) conveys all the information needed to determine an optimal next move (no need to consider the history of previous states). That is, the game can be tackled by traditional reinforcement learning (RL), a bit like 2 decades ago, when Tesauro used RL to learn from scratch a backgammon player on the level of the human world champion (1994). Today, however, we are greatly profiting from the fact that computers are at least 10,000 times faster per dollar.
In the last few years, automatic Go players have greatly improved. To learn a good Go player, DeepMind’s system combines several traditional methods such as supervised learning (from human experts) and RL based on Monte Carlo Tree Search. It will be very interesting to see the system play against the best human Go player Lee Sedol in the near future.
Unfortunately, however, the Markov condition does not hold in realistic real world scenarios. That’s why games such as football are much harder for machines than Go, and why Artificial General Intelligence (AGI) for RL robots living in partially observable environments will need more sophisticated learning algorithms, e.g., RL for recurrent neural networks.
For a comprehensive history of deep RL, see Section 6 of my survey with 888 references: