Today I went to Stanford to attend an AI Salon session hosted by the Stanford AI Lab. The topic of the salon today was "Deep Reinforcement Learning for Real World Systems". The speakers were Prof. Sergey Levine & Prof. Mykel Kochenderfer.
For a context:
Prof. Mykel Kochenderfer is in charge of the Stanford Intelligent Systems Lab, where he researched on the systems for air traffic control, unmanned aircraft, and automated driving. I attended his speech in University of Toronto once, where he shared his work on formulating air traffic control with POMDP. His website can be found here.
Prof. Sergey Levine is from the Robotic Artificial Intelligence and Learning Lab in UC Berkeley. According to his self-introduction, he is researching on the end-to-end Reinforcement Learning techniques for robotics. His website can be found here.
Both of the speakers recognized the potential of Reinforcement Learning in real world applications. Prof. Kochenderfer researched on the air traffic planning with POMDP (Dynamic Planning). he said that he has tried RL to learn the policy (think it as the guidance for decision making) of POMDP for unmanned aircraft and the model could learned (approximated) it surprisingly well. Prof. Levine also gave a good example of RL work: he successfully trained robotic arms to grasp items (from cups to clothes) only with RGB cameras that don’t have any depth information.
However, they pointed out two factors that limits RL in more broad applications. 1) For lots of problems, it is very difficult to come with a good rewarding function for RL. “Compared with the rewarding function, the amount of time we spent at the algorithm is trivial. “ That’s from Prof. Kochenderfer when he talked about his work in air traffic planning. For complex systems like self-driving cars, the rewarding function is not as easy as to draw bounding boxes at surrounding cars. 2) It is difficult to prove the reliability of RL models at safety critical applications, like unmanned aircraft. If you think fo RL, essentially the agents are keep collecting data and evolving themselves through exploration. However, when doing inspections in real world, you cannot simply tell the regulators that your systems will evolve and learn from mistakes.
Both professors pointed out interesting facts that are relevant with applications of RL in real world: they think 1) the demands of training data for RL is currently over-exaggerated and 2) we may be able to see the potential of transfer learning in RL field. For the first point, Prof. Levine gave the example of robot grasping: he trained the model with 800,000 grasp attempts, which, according to himself, is less than half of what took for traditional methods and way less than the ImageNet data. I think it is still difficult to reach that amount in real world, but there is a chance to reduce the amount by optimizing the system design. The second point is more interest: similar to a kid learning how to run, play football, and jump with peers, RL agents should be able to be “skillful" by giving decent amount of training data from multiple related tasks. After seeing the use of transfer learning in object detection and NLP (word2vec), if we can develop an effective way of transfer learning in RL, we may be able to see a blossom of applications in this field.
For the current moment, it seems RL is more suitable for policy approximation and non-safety critical applications (like Netflix’s movie suggestions).