This page has titles and abstracts for the talks from the invited speakers, in the order in which they spoke.
The slides presented as part of most of the talks are linked from the talk titles. More videos to come. Deconstructing Reinforcement Learning (Video) Richard Sutton, Department of Computing Science, University of Alberta, Canada The premise of this symposium is that the ideas of reinforcement learning have impacted many fields, including artificial intelligence, neuroscience, control theory, psychology, and economics. But what are these ideas and which of them is key? Is it the idea of reward and reward prediction as a way of structuring the problem facing both natural and artificial systems? Is it temporaldifference learning as a samplebased algorithm for approximating dynamic programming? Or is it the idea of learning online, by trial and error, searching to find a way of behaving that might not be known by any human supervisor? Or is it all of these ideas and others, all coming to renewed prominence and significance as these fields focus on the common problem that faces animals, machines, and societieshow to predict and control a hugely complex world that can never be understood incompletely, but only as a gross, everchanging approximation? In this talk I seek to start the process of phrasing and answering these questions. In some cases, from my own experience, I can identify which ideas have been the most important, and guess which will be in the future. For others I can only ask the other speakers and attendees to provide informed perspectives from their own fields.
Terrence Sejnowski, HHMI, The Salk Institute, and the University of California, San Diego, USA
Zebra finch are born in the spring when they are exposed to a conspecific male song. Later in the spring the bird begins to vocalize and the song gradually converges to the tutor song. There is increasing evidence that birdsong learning occurs through associative rewardpenalty reinforcement learning.
Fifty Years of RL in Games Gerald Tesauro, IBM Research, USA Many researchers have advocated game domains as highly useful testbeds where one can cleanly isolate and study important issues faced by RL and more general AI methods in tackling messy realworld problems. In this talk, I'd like to survey some of the highlights of the numerous studies of RL in various game domains since Samuel's seminal work of fifty years ago. My definition of "games" is broad and will include puzzles, competitions, simulated marketplaces, and video/online games. I will also talk about the relationship and differences between traditional singleagent RL and more recent multiagent learning algorithms, which are likely necessary in general multiplayer games. The goal of the talk is to draw a larger perpsective on what we have learned from studying RL in games, and where promising future opportunities may lie, not only as RL theory advances, but as "games" themselves continue to evolve with advancing technology.
Reinforcement Learning, Apprenticeship Learning, and Robotic Control (Video)
Andrew Ng, Department of Computer Science, Stanford University, USA
Reinforcement learning has proved to be a powerful method for robotic control. In this talk, drawing on examples from autonomous helicopter flight, quadruped robot control and autonomous driving, I'll describe some of the challenges we've faced in applying RL algorithms to various control problems, such as (i) Problems where the reward function is exceedingly difficult to specify by hand, and must itself be learned, (ii) Safe exploration, where one wishes to explore without damaging the robot, and (iii) Learning high performance controllers even if we have only an extremely inaccurate model of our robot's dynamics. Using apprenticeship learningin which we learn by watching an expert demonstrationas a unifying theme, I'll also describe a few algorithms for addressing these challenges.
On temporal Difference Methods and Extensions (Video)
Dimitri Bertsekas, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA
We propose new TD methods that converge at the fast rate of LSTD and LSPE, but do not require any matrix inversion. Moreover, they do not require a full rank assumption on the feature matrix for convergence to some optimal weight vector. These methods are special cases of a broad class of simulationbased iterative methods for approximate solution of highdimensional fixed point problems within a lowdimensional feature subspace, consistent with the framework of Galerkin projection methods.
Wolfram Schultz, Department of Physiology, Development & Neuroscience, University of Cambridge, UK The functions of rewards are based primarily on their effects on behavior and are less directly governed by the physics and chemistry of input events as in sensory systems. Therefore the investigation of neural mechanisms underlying reward functions requires behavioral theories that can conceptualize the different effects of rewards on behavior. The scientific investigation of behavioral processes by animal learning theory has produced a theoretical framework that can help to elucidate the neural correlates for reward functions in reinforcement learning. Individual neurons can be studied in the reward systems of the brain, including dopamine neurons, orbitofrontal cortex and amygdala. Neural activity in these brain structures can be related to the basic theoretical terms of animal learning theory. Without having all learning terms exhaustively tested, the data so far suggest that contiguity is encoded by dopamine and orbitofrontal neurons, contingency is coded by amygdala neurons, and prediction error is coded by dopamine and amygdala neurons.
Approximate Dynamic Programming: Solving the curses of dimensionality (Video)
Warren B. Powell, Department of Operations Research and Financial Engineering, Princeton University, USA
The operations research community often finds itself managing transportation companies, supply chains, manufacturing enterprises and the military. Of course these problems are stochastic and dynamic, but they are also notoriously high dimensional. Finding a modeling and algorithmic framework that captures both the complexities of real operations, while also offering practical solution methods, has literally been a multidecade search which ended with the discovery of approximate dynamic programming …. with a twist. The twist was the use of the postdecision state variable to eliminate the expectation within Bellman’s equation, opening the door for the use of a wide range of deterministic optimization algorithms for highdimensional decision vectors. Then we needed the realization that piecewise linear, separable value function approximations work for a wide range of resource allocation problems, in particular the types of largescale problems that arise in transportation and logistics. We have not solved every stochastic dynamic problem, but we now have a robust set of algorithms that handles a wide range of important resource allocation problems. Three and a Half Theories of Neural Reinforcement Learning (Video)
Peter Dayan, Gatsby Computational Neuroscience Unit, University College London, UK
Ties between ideas and methods in reinforcement learning and psychological and neurobiological experiments into learning in the face of rewards and punishments go well beyond the identification of the phasic activity of dopamine neurons with the temporal difference learning signal. We will discuss the three core theories of modelfree, modelbased and episodic control in the brain, justifying this multiplicity on the grounds of their differing statistical and computational characteristics. We will also consider how these instrumental controllers can be affected by their sometimes misbehaving Pavlovian halfsibling. This is joint work with Nathaniel Daw, Yael Niv and Ben Seymour.
Reflections on Temporal Difference Learning (Video)
Ben Van Roy, Department of Management Science and Engineering, Stanford University, USA
For two decades, TD has been a source of intrigue. It offered an intuitively appealing model of reinforcement learning in natural systems and a promising new algorithm for approximate dynamic programming. Multiple benefits arising from its use of realized state trajectories inspired elegant mathematical insights. Its prowess in solving some complex Markov decision problems fueled optimism. But TD is not yet recognized as a reliable method for solving real problems. In this talk, I will reflect on some of this history, enchantment with and applications of TD, and issues that remain to be resolved.
Beyond Reinforcement Learning (Video)
Andrew Barto, Department of Computing Science, University of Massachusetts, Amherst, USA
I am delighted to be able to provide the closing talk of this meeting. It is with great admiration for all the sophisticated and wonderful work that has been done in areas related to computational reinforcement learning (RL) that I share some recent thoughts about how we can expand our enquires beyond what is usually considered the legitimate topic of RL. In the RL framework as we know it, reward functions determine the problem the learning agent is trying to solve. Properties of the reward function influence how easy or hard the problem is, and how well an agent may do in trying to solve it, but RL theory and algorithms are completely insensitive to the source of rewards (except perhaps requiring that reward magnitude be bounded). This is a great strength of the framework because of the generality it confers, but it is also a weakness because it excludes from the RL framework help in figuring out how to define reward functions. For simple, unitary problems, defining a reward function may not be difficult, but to extend the utility of RL to agents that must operate effectively when facing diverse challenges in complex environments, it is much more difficult. We know that reward signals in animals are generated by complex brain circuitry that has evolved over millennia to facilitate reproductive success. In this talk, I argue that the RL framework should be expanded to include computational accounts of rewardproducing mechanisms. To suggest how this may be done, I look first to neuroscience and ethology for information about where rewards come from in nature and what their value is. I then describe computational experiments recently carried out by Satinder Singh, Rick Lewis, and me that elucidate aspects of the relationship between ultimate goals (e.g., reproductive success for an animal) and the primary rewards that drive learning. Among the lessons provided by these experiments are clarification of the traditional notions of extrinsically and intrinsically motivated behavior and that the precise form of optimal reward functions need not bear transparent relationships to the agent’s ultimate goals. I conclude that we need to address major challenges in designing motivational systems in which RL mechanisms can work effectively for agents that face multiple challenges over extended lifetimes.
