Reinforcement learning with Marr

https://doi.org/10.1016/j.cobeha.2016.04.005Get rights and content

Highlights

  • Reinforcement learning as a field spans all three of Marr's levels of analysis.

  • Despite much progress, open questions remain at every level.

  • These call for multidisciplinary research that crosses boundaries between levels.

To many, the poster child for David Marr's famous three levels of scientific inquiry is reinforcement learning — a computational theory of reward optimization, which readily prescribes algorithmic solutions that evidence striking resemblance to signals found in the brain, suggesting a straightforward neural implementation. Here we review questions that remain open at each level of analysis, concluding that the path forward to their resolution calls for inspiration across levels, rather than a focus on mutual constraints.

Section snippets

The computational level: the goals of a decision-making system

At the computational level, the basic goal of an agent or a decision-making system is to maximize reward and minimize punishment. Although one might argue whether this is the true goal of agents from an evolutionary perspective, different definitions of reward and punishment allow considerable flexibility. Indeed, work in recent years has elaborated on what constitutes a reward — in addition to the obvious food and shelter (and their associated conditioned reinforcers) there seem to be other

The algorithmic level: multiple solutions to the decision-making problem

Given the computational goal of maximizing reward, how does a decision-making agent learn which states of the world predict reward, and what actions enable their attainment? RL provides multiple algorithmic solutions to the problem of credit assignment (i.e., correctly assigning credit or laying blame for an outcome on preceding actions or states). Many of these algorithms proceed through the incremental update of state- and action-specific ‘values’ defined as the (discounted) sum of future

The implementational level: dopamine-dependent learning in the basal ganglia

At the final level of the hierarchy, neuroscientists have had considerable success in mapping functions implied by RL algorithms to neurobiological substrates. Whereas some of the computational and algorithmic questions highlighted above revolved around scaling RL to environments with real-world action and state complexity, the problems at the implementational level arise from the sheer complexity of the neural system, as well as the limitations of different experimental methods.

Much of this

Conclusion — inspiration across levels

Reinforcement learning is perhaps the poster child of Marr's levels of analysis — a computational problem that, expressed formally, leads to a host of algorithmic solutions that seem to be implemented in human and animal brains. However, as with many classification schemes, too much emphasis on delineation of levels can distract from the holistic nature of scientific inquiry. As we have shown, the boundaries between the levels are not clear cut, and cross-disciplinary interaction among

Conflict of interests

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

We are grateful to Gecia Bravo-Hermsdorff, Mingbo Cai, Andra Geana, Nina Rouhani, Nico Schuck and Yeon Soon Shin for valuable comments on this manuscript. This work was funded by the Human Frontier Science Program Organization and by NIMH grant R01MH098861.

References (103)

  • L.P. Kaelbling et al.

    Planning and acting in partially observable stochastic domains

    Artif Intell

    (1998)
  • N.D. Daw et al.

    Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

    Nature Neurosci

    (2005)
  • V. Voon et al.

    Disorders of compulsivity: a common bias towards learning habits

    Mol Psychiatry

    (2015)
  • W. Potjans et al.

    A spiking neural network model of an actor-critic learning agent

    Neural Computat

    (2009)
  • R.P. Rao et al.

    Spike-timing-dependent Hebbian plasticity as temporal difference learning

    Neural Computat

    (2001)
  • K.C. Berridge

    The debate over dopamine's role in reward: the case for incentive salience

    Psychopharmacology

    (2007)
  • J.Y. Cohen et al.

    Neuron-type-specific signals for reward and punishment in the ventral tegmental area

    Nature

    (2012)
  • S.J. Gershman et al.

    The successor representation and temporal context

    Neural Computat

    (2012)
  • D. Marr et al.

    From understanding computation to understanding neural circuitry

    Neurosci Res Program Bull

    (1977)
  • R.S. Sutton et al.

    Reinforcement learning: an introduction

    (1998)
  • R.S. Sutton

    Learning to predict by the methods of temporal differences

    Mach Learn

    (1988)
  • J.C. Houk et al.

    A model of how the basal ganglia generate and use neural signals that predict reinforcement

  • R.C. O’Reilly et al.

    Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia

    Neural Computat

    (2006)
  • P.R. Montague et al.

    A framework for mesencephalic dopamine systems based on predictive Hebbian learning

    J Neurosci

    (1996)
  • W. Schultz et al.

    A neural substrate of prediction and reward

    Science

    (1997)
  • A.G. Barto

    Adaptive critics and the basal ganglia

  • M.J. Kang et al.

    The wick in the candle of learning epistemic curiosity activates reward circuitry and enhances memory

    Psychol Sci

    (2009)
  • G. Loewenstein

    The psychology of curiosity: a review and reinterpretation

    Psychol Bull

    (1994)
  • J. Schmidhuber

    A possibility for implementing curiosity and boredom in model-building neural controllers

    From animals to animats: proceedings of the first international conference on simulation of adaptive behavior

    (1991)
  • P.-Y. Oudeyer et al.

    What is intrinsic motivation? A typology of computational approaches

    Front Neurorobot

    (2007)
  • A.G. Barto

    Intrinsic motivation and reinforcement learning

  • Ö. Şimşek et al.

    An intrinsic reward mechanism for efficient exploration

  • S. Singh et al.

    Where do rewards come from

  • M. McDevitt et al.

    When good news leads to bad choices

    J Exp Anal Behav

    (2016)
  • J.M. Pisklak et al.

    When good pigeons make bad decisions: choice with probabilistic delays and outcomes

    J Exp Anal Behav

    (2015)
  • E.S. Bromberg-Martin et al.

    Lateral habenula neurons signal errors in the prediction of reward information

    Nature Neurosci

    (2011)
  • S. Singh et al.

    Intrinsically motivated reinforcement learning: An evolutionary perspective

    IEEE Trans Autonom Mental Dev

    (2010)
  • X. Guo et al.

    Reward mapping for transfer in long-lived agents

    Adv Neural Inform Process Syst

    (2013)
  • J. Sorg et al.

    Internal rewards mitigate agent boundedness

  • C. Hull

    Principles of behavior: an introduction to behavior theory

    (1943)
  • M. Keramati et al.

    Homeostatic reinforcement learning for integrating reward collection and physiological stability

    eLife

    (2014)
  • M. Keramati et al.

    A reinforcement learning theory for homeostatic regulation

    Advances in neural information processing systems

    (2011)
  • I. Nahum-Shani et al.

    Just in time adaptive interventions (JITAIs): an organizing framework for ongoing health behavior support

    Methodology Center Technical Report No. 14-126

    (2014)
  • I. Nahum-Shani et al.

    Building health behavior models to guide the development of just-in-time adaptive interventions: a pragmatic framework

    Health Psychol

    (2015)
  • G. Konidaris et al.

    Transfer in reinforcement learning via shared features

    J Mach Learn Res

    (2012)
  • S.J. Gershman et al.

    Learning latent structure: carving nature at its joints

    Curr Opin Neurobiol

    (2010)
  • S.J. Gershman et al.

    Discovering latent causes in reinforcement learning

    Curr Opin Behav Sci

    (2015)
  • S.J. Gershman et al.

    Gradual extinction prevents the return of fear: implications for the discovery of state

    Front Behav Neurosci

    (2013)
  • S.J. Gershman et al.

    Perceptual estimation obeys Occam's razor

    Front Psychol

    (2013)
  • A.G. Collins et al.

    Working memory contributions to reinforcement learning impairments in schizophrenia

    J Neurosci

    (2014)
  • Cited by (34)

    • Learning from other minds: an optimistic critique of reinforcement learning models of social learning

      2021, Current Opinion in Behavioral Sciences
      Citation Excerpt :

      Thus, RL provides a powerful framework for studying the brain as an information-processing system: It offers a computational specification of the problem to be solved, a suite of algorithms that describe possible solutions, and a link between algorithms and their physical implementation. Its success in characterizing non-social learning across Marr’s levels of analysis [10,11] has naturally led to extensions of this approach to understand how humans learn in social contexts [12,13••]. In the past decade, RL models have been productively applied to study the neural basis of social learning [13••,14,15].

    • Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments

      2021, Neuron
      Citation Excerpt :

      The framework of reinforcement learning (RL) has illuminated how agents learn to make adaptive choices from trial and error feedback (Niv and Langdon, 2016).

    • Is There a ‘Social’ Brain? Implementations and Algorithms

      2020, Trends in Cognitive Sciences
      Citation Excerpt :

      By using paradigms with non-social controls, and by using both neural recordings and causal methods, these studies provide one of the clearest cases for social specificity at the level of implementation (Figure 3); there are cells in the ACC that specifically respond to other’s pain and reward across humans, primates, and rodents, and ACC damage selectively disrupts social information processing. The development of model-based fMRI in humans has seen many studies testing whether different competing algorithms can explain behaviour and map onto functional anatomy [56–58]. RL is perhaps the best exemplar of a clear algorithmic process, and has been applied extensively to understand self-relevant and social behaviour.

    • Integrating Models of Interval Timing and Reinforcement Learning

      2018, Trends in Cognitive Sciences
      Citation Excerpt :

      Computational theories of reinforcement learning (RL) have begun to incorporate a precise sense of time by building a bridge with theories of temporal processing [3–6]. Guided by Marr’s three levels of analysis [7,8] this review will discuss the integration of RL with temporal processing at computational, algorithmic, and neural levels of analysis. We first outline the computational goal of RL, and how this relies critically on IT, particularly in the seconds-to-minutes range.

    View all citing articles on Scopus
    View full text