PSYC 310 Lecture Notes - Lecture 7: Memory Consolidation, Dopaminergic, Foodborne Illness
PSYC 318
Behavioural Neuroscience II
January 29th, 2018
Lecture 7/24: Model-based decision making
• Decision making
o Options: I have experienced rewards in these states, part of the brain that’s been
keeping track of how many times I experienced more value than expected/of
certain magnitude
• Goal-directed vs. habitual behavior
o How to determine if someone is making decisions in a habitual manner?
▪ Devalue the reinforce: if reward is no longer valuable to animal, but put in
the same situation they do the same action
▪ Contingency degradation: present new knowledge to animal, i.e. if he has
historically had to press lever for reward but now you’re providing free
reward: relationship between effort he’s been putting in to get reward
changes, no longer relationship between effort/reward
o You can design a task based on exact details of how often effort produces
rewards, in order to bias behavior to be more habitual or more goal-directed
▪ People often use task where first week of training animals exhibiting more
goal-directed behavior are more sensitive to value of rewards they’re
working for, but after training (10-15 days of daily task sessions) they stop
being sensitive to these manipulations
• Then start to lesion parts of the brain: i.e. dorsolateral striatum, to
make animals very resistant to forming habits
• Animals will always remain sensitive to value of reward, effort
required to obtain it, despite excessive training: ‘store cashed
values’
o Lesioning in prefrontal cortex
▪ Rodent brain with areas of PFC labelled: CG, PL, IL, OF/OFC
▪ How sensitive are animals to the value of reinforcer/contingency of task in
relation to how much experience they have on task?
• PL and OF/OFC seemed to promote goal-directed behavior: if
you lesioned them animals showed habitual behavior very early
on: habits early on making them sensitive to changes in task
reward contingencies
o These regions rely on how valuable the reward is, what
goal is
▪ Infralimbic cortex and reward devaluation
• Instruction cue tone indicates which way to turn to get food. Food
reward given is unique to each arm, so rat can’t choose which
food he will get on trial. He gets to choose whether he will make
the correct turn to get the food available on the trial
• How often did animals turn in the correct direction? Over 5 days
they get really good at walking down track and listening: tone 1 for
turning left, tone 2 is right
• Promotes habitual responses to these tones
find more resources at oneclass.com
find more resources at oneclass.com
• After training on task, give animal free access to one of the foods,
an hour later give him food poisoning with stomach injection of
lithium chloride, making him hate that food for a long time
• Theory that there are competing parts of brain: we overtrain it so
much so the habit part of brain is winning, but what if we broke
some of that circuitry?
▪ Infralimbic cortex (from other lesion studies) seems to be promoting
habits
• When tone was played to go right for fruit loops, infralimbic cortex
responds to tone telling animal to go right
• Optogenetic approach to reduce neural activity in infralimbic
cortex just for two seconds when tone is coming on, do animals
make same decision if infralimbic cortex no longer active?
• It doesn’t do anything if you haven’t devalued the reward: you’ve
created competition between the two parts of the brain if you do
devalue it
• Model-free versus model-based actions
o Human data is consistent with rodent data in particular path tasks: people have
imaged the brain of subjects while they play computer games that utilize decision
trees
o When animals are playing games using stored values based on histories of
eifoeets ash alues e see BOLD atiit i dosolateal stiatu
corresponded with cached values of decisions they were making (values based
on previous experience of choosing those options)
o DM striatum activity corresponded with inferred values of decisions making
(values calculated on the fly)
• Why is there a dopamine-encoded temporal difference reward prediction error signal in
brain areas that use cognitive maps & models?
o Dopaminergic prediction error signal seems to incorporate information from
models of the world, cognitive maps
o When people use an fMRI scanner they make decisions in model-based manner,
their BOLD signals indicate the presence of a prediction error signal that is using
predictions based on information that could only be located in cognitive map
o When we think through task ahead of time dopamine signal encodes
expectations, that signal could be reinforcing or training patterns or behavior in
eal tie that dot hae histoial oetio to epeiee though eads:
mentally prepare yourself for certain tasks by imaging things
• Model-free and model-based systems probably complement each other
o The model-free system makes things easier for us
o When we try to figure out best course of behavior there are many paths we
could take and have to think through, but we look for our habitual based
intuition system to clash whole branches of decisio tee: ee tie Ie doe
this it hast oked out, so I ot put eeg ito that deisio patha
▪ Fees up oe poessig tie to fous o othe ahes that euie
more thinking
find more resources at oneclass.com
find more resources at oneclass.com