Neuroscience 2005 Abstract
| Presentation Number: | 400.10 |
|---|---|
| Abstract Title: | A reinforcement learning model predicts monkey's choice and dorsal striatal activities. |
| Authors: |
Samejima, K.*1
; Ueda, Y.
; Doya, K.1
; Kimura, M.
1CNB, ATR-CNS, Kyoto, Japan |
| Primary Theme and Topics |
Sensory and Motor Systems - Basal Ganglia -- Systems physiology and behavior |
| Secondary Theme and Topics | Cognition and Behavior<br />- Motivation and Emotion<br />-- Learning |
| Session: |
400. Basal Ganglia Systems and Behavior I Poster |
| Presentation Time: | Monday, November 14, 2005 9:00 AM-10:00 AM |
| Location: | Washington Convention Center - Hall A-C, Board # Z12 |
| Keywords: | REWARD, LEARNING, STRIATUM, BASAL GANGLIA |
To make decision in dynamic environment, an animal must update evaluation of candidate actions based on past experiences of actions and rewards. Reinforcement learning (RL) explains reward-based decision-making and adaptive choice of actions by the three steps: i) estimate the action value, i.e., how much reward value an action will yield; ii) compare the action values of alternatives to select an action; and iii) update the action value by the discrepancy between expected and acquired reward after the action. To test whether the striatum encodes action value, we recorded activity of striatal projection neurons of two monkeys making free choices between left- and right-turn of a handle. Reward probability of each of the two actions was fixed at either 10, 50 or 90% in a block of trials but varied between blocks. In the previous report (Samejima et al, 2003 SfN abstract), we showed that striatal activity were modulated by reward probability for a particular action in block-by-block comparisons. To study adaptive process of striatal activity on trial-by-trial basis, we developed a novel RL model-based method to estimate action values upon each trial using a Baysian inference method ‘particle filter’. The RL model with estimated action values from the past history of choices and rewards successfully predicted monkey’s subsequent choices. Regression analysis of striatal neuronal discharge rate before the action choices with the estimated action value showed that about half of neurons (66/142) correlated to the estimated action values. Three-fourths of them (48/66) specifically encoded action value for one of two actions rather than difference of action values (10/66) or action-independent value (8/66). This result support the RL model of basal ganglia in which striatum projection neurons encode action values, which are updated by reward expectation error signal carried by the midbrain dopamine neurons and are used for decision and action selection.
Supported by grants from MEXT, CREST/JST, and by NICT
Sample Citation:
[Authors]. [Abstract Title]. Program No. XXX.XX. 2005 Neuroscience Meeting Planner. Washington, DC: Society for Neuroscience, 2005. Online.
Copyright © 2005-2026 Society for Neuroscience; all rights reserved. Permission to republish any abstract or part of any abstract in any form must be obtained in writing by SfN office prior to publication.