1/21/2013 4:10:00 PM
Operant Conditioning (S-R: learning)
3 primary elements:
o 1) stimulus (S)
stimulus is also a signal.. allows the response to occur.
you respond because that stimulus has become important
through classical conditioning.
Operant stimulus, is the stimuli you are responding to
There are other conditioned stimuli as well
o 2) response (R)
o 3) significant event (S*)
stregthens the S-R bond.
o Here you are not responding to the burger, you are responding to the
McDonalds sign which became important through classical conditioning.
Sign and burger have been associated together over and over and over.
Therefore, what makes it operant is the RESPONSE.
o Operant stimulus
the cigrette! (you pick it up, smoke it- this is the most direct
element, -the operant response is smoking)
o ..the other stimuli are conditioned stimuli that cause a conditioned
(contextual cues, smell, smoke.. etc)
- Marlboro visual stimulus also smell
- Contextual cues bars or smoking chamber
- Only one is the operant stimuli the cigaret
(that is what you operate) - look at what the most direct
element is that you respond to
Biologically significant stimuli (S*) Tobacco - Nicotine (the drug)
-Operant response the act of smoking (related to the cigarette)
1) Primary Reinforcers
o stimuli needed for survival = food, water, sex o stimuli that mimic the effects of food, water and sex in the brain = drugs
o sensory stimulation & novelty
require no learning, they are biologically important
2) Secondary Reinforcers
o A previous neutral stimulus that has acquired the capacity of strengthen
responses because it has been repeatedly paired with food or with some
other primary reinforcer
the S* isnt what strengthens the bond.. it is the CS
(which was learned through conditioning)
The S associated with the S* (S-S*) Conditioned Stimulus
Meaningful because it has been paired with a US
use CS to reinforce link with other stimulus, not to alter behavior.
o (Wolfe 1936)
Chimpanzees pressing a lever for tokens
Trained Chimps to press lever for tokens. And they could
give the tokens in for bananas. Treated the CS as food.
Chimps would take tokens from each other. The token
became a CS to promote the bond.
o Humans biggest secondary reinforcer is money!
Can be used to alter behaviour
3) Social Reinforcers
o Stimuli whose reinforcing properties derive uniquely from the behavior of
other members of the same species: praise, affection, attention. They are
usually a blend of primary and secondary reinforcers (smile & good)
stimuli that have ability to strengthen the bond, comes from
behaviour of others.
works well within species, but not among different species.
word 'good' becomes important (social reinforcer/CS) bc its
associated with positive reactions.
can be negative.. but would be odd to call that a 'reinforcer'
Ability to strengthen the link between a stimulus and a response -
comes from a reaction within the genes. Works between species
Facial reactions - inborn tendency to recognize smiles (good)
Can be negative - but difficult to call a reinforcer - punishment Conditioned Reinforcers & shaping
o There is a difference of what a stimulus does to behaviour, and what it
does to how you 'feel' about it. We are discussing strictly behavioural
(series of actions that become more/less frequent depending on if
they are reinforced)
o Rat in Box example:
Stimulus (S) = lever intsrutment animal is operating
-is also the CS after it's learned
Response (R) action of pressing the lever
Reinforcer/ Significant Stimulus food pellet
active process of teaching the response...
-make rat hungry (restrict food, make them excited)
-they may press the lever, how do they make the
link between the lever and food?
put animal in chamber & turn on food dispenser.
o (animal doesnt do anything, just hear clicking
of food falling in and getting the food.)
need to establish a reliable CS predictive of the food
(clicking it makes when falling in dispenser)
-animal turns to face lever, you make noise by
droping food. every time it looks at lever, you make
-you then play trick.. when it looks at lever, you
DONT make noise.. then the animal will approach the
-you then make noise every time it approaches the
-then you don't make noise when it approaches..
Stage 4: -then the rat will press the lever & you make the
noise immeidately after, and everytime they touch
-the reinforcer (clicking noise) is still a CS (still used as a CS)
-you shape the response (shaping), and then the response
is maintained by the reinforcer. (operant conditioning)
(shaping is what you need for the response)
Law of effect
„Law of effect‟ term first used to describe process of operant conditioning.
o (discovered that experience can modify non-reflexive behaviour.)
gradual modification of non-reflexive behaviour by experience
o “of several response made to the same situation, those which are
accompanied or closely followed by satisfaction to the animal will be more
firmly connected with the situation, so that when it reoccurs, they will be
more likely to reoccur.”
Great the satisfaction greater the strengthening of the bond
Talking about how animal „feels.‟
o -the stimulus (box) is followed by the response (pulling lever), if the bond
is strengthened by reinforcer/satisfaction (freedom)
raw component satisfaction
satisfaction “stamps in” the connectin between S & R
o “by a satisfying state of affairs is meant one which the animal does
nothing to avoid, often doing such things as attain and preserve it.”
Essence is the „stamping in‟ of the S-R bond
(caused by satisfaction)
Puzzle Box example:
o -used puzzle box for cat
o -put cat inside, cat tries to get out, until they find the lever which opens
the box door. after they've done this so many times, they are able to
escape in seconds.
o -example of non-reflexive behaviour that is modified by experience! BOUTON
Instrumental learning generally works so that organisms develop responses
that maximize benefit (obtain stimuli with positive survival value) and minimize
cost (prevent stimuli with negative survival value)
o Instrumental behavior increases or decreases depending on its effect on
Good S* (good for survival)
Bad S* (will kill you)
o (reward learning)
perform a behvaiour that obtains stimuli that are GOOD S*
o (omission learning)
perform behaviour that prevents you form obtaining GOOD S*
o (punishment learning)
perform behvaiour that obtains stimuli that is BAD S*
o (avoidance learning)
iperform behvaiour that prevents you from obtaining BAD S*
“I can‟t get no Satisfaction”
response will increase if followed by a satisfying outcome, but the only way we
know if the outcome is satisfying, is if the response increases.
o if something there isn't any satisfaction, there would be no learning?
If rewards are stimuli that produce satisfaction by reducing drives, then
behavior should not increase in the absence of satisfaction or drive reduction. Goal Box Experiment: (Sheffield, Wulf, Backer)
o -rats mount females in the box, and just before ejaculation, rat is
removed (stopping the satisfaction- ejactulation which spreads speed)
-thus.. they should go slower, because satisfaction was removed?
-BUT they go faster!
behvaiour increases in the absense of satisfaction.
Paradoxical reward effect (AMSEL)
o no satisfaction, yet behaviour still increases.
(obviously satisfaction is not always necessary)
Latent learning (TOLMAN)
o animal actually is always learning in the absence of the satisfaction,
and only use that learning when it's presented.
o -individual receives a dose of morphine through a line inserted into their
muscle. (they don't know what drug will be injected)
o -there response is pressing a button in front of them a number of times.
number of response per second
(probing their behaviour)
o -immediately after the injection, you ask them "how much do you like
your injection from 0-50"
(probing subjective responses- their feelings)
-dose is 0:
response is low for both behaviour & liking
-dose is 15:
response of behaviour increases
response of liking doesn't
dose is 30:
response is high for both behaviour & liking.
o KEY Affective (feels good or bad) reaction to a S* is NOT the ONLY
element that is key to its effect on behavior -the drug is clearly reinforcing behaviour, in the absence of
subjective liking (they can disassociate, don't need to go together!)
o reinforcement the effect on behaviour. (what we focus on)
o reward is the subjective feeling.
example : son associates riding the horse with the barbers.
-if it was reinforcer, every time he see's horse he will want
to get hair cut (reinforcer increases the behavaiour)
...instead the horse is the REWARD (doesn't strengthing
o shaping the experimenter shapes the response
o auto-shaping programed into a computer & let rat go.
QUESTION: primary reinforcing stimuli:
A)are only stimuli needed for survival
B) possess UC motivation value
C) are rewarding
D) reward good behvaiours
E) none of the above Reinforcement Theories
Contiguity Theory (GUTHRIE)
Operant conditioning occurs when S, R and S* occur together in time.
o -each individual will typically press the padle in the same way. the
response was very stereotypic, repetitive, and identical from time to time.
the animal is just associating these things together, and responding like a
Stop Action Principle:
o any position when reinfoced (S*), you will be likely to repeat.
Any specific bodily position and the muscle movements occurring
when the S* is delivered will have a higher probability of occurring
in the future.
-argue that operant conditioning leads to these responses,
rather than cognitive reactions. but also true that operant
conditioning doesn't always lead to these automatic
o superstitious behaviours
- form of automatic responding, pulling up your socks every time
you take a shot. Wear lucky pants every time, after you've done
well. (no actual cognitive link)
Cognitive Theory (TOLMAN)
During operant conditioning, animals make S-S* associations. Rs are highly
flexible, and the primary role of a S* is to motivate behavior.
o “we agree with the other school that the rat in running a maze is exposed
to stimuli and is finally led as a result of these stimuli to the response
which actually occur. However, we feel that the intervening brain
processes are more complicated and autonomous than the stimulus-
response psychologists do.”
Both views are correct, in different situations.
o tell animal it is a memory test, banana on one side, none on the other.
o Cover it, then remove the screen and ask animal to make response.
if they respond to where food was matching.
o They switch the banana with lettuce, the animal gets mad.
(they expected to find banana). This is indication that animals can form cognitions about the
consequences of their actions. (expectation.)
Reinforcement Theory (SKINNER)
law of effect by skinner, defines reinforcing stimulus as "stamping-in".
o A reinforcer is really allowing you to create a bond between S-R.
o reinforcers are acting on storage of information (memory).
obvious because if you remove the sugar from the despenser, the
animal still responds (it is the memory that drives the behaviour)
o An event that enhances the storage of information about situations in
which it is encountered - “Stamping-in”
o This enhanced storage increases the probability that the behavior
leading to the reinforcer will be repeated in the future, even in the
absence of the reinforcer. Why is a Reinforcer, Reinforcing?
A reinforcer is an event that follows a response and changes the probability
that the response will be emitted in the future
o How can the event change behavior, when the new behavior occurs in
absence of the event?
1) enhancement of memory consolidation
o Reinforcing events enhance the acquisition and the storage of
information in the brain.
Reinforcers enhance/promote memory
2) Attribution of conditioned motivation
o Learning is the formation of representations of the relationships among
objects and events. A representation of a reinforcer will motivate
Reinforcers give motivation flavour to behaviour & situation
provide a motivational context for behaviour.
Attribute motivational/learned/conditioned value)
o after you are exposed to a stimulus and have a response, a memory
trace is produced. A reinforcer enhances the memory, and gives
motivation to the stimuli & responses. Effect of Reinforcers on Memory Consolidation
as you form a new memory, it is fragile (not permanent). as your brain is
processing it, the memory can be changed. the transition from a fragile to
permanent state is called consolidation.
o (this is an active process)
A) Inhibition of memory consolidation
1) learning other information
o if you learn a series of letters, 2 mins later you learn another series.
what you learned last, interferes with what you learned first. this is
because your learning the 2nd set while you are processing the 1st. you
are interfering with the consolidation process.
2) ECT (electroconvulsive therapy)
o ECT will erase memory that hasn't been consolidated. patients
experience memory loss of hours before the treatment. (temporary
3) Trauma o can produce stoppage of activity, and produce loss of memories that
are still in the consolidation process.
B) Facilitation of memory consolidation
1) Emotional Events
o the memory is encoded and flashed by something in the brain that is
activated by a very strong emotion
(you remember it better)
2) Reinforcing Stimuli
o explained in depth…..
Passive Avoldance Task (HUSTON)
to determine if a stimulus can enhance memory consolidation.
o animal completes a task to create a memory. then when the memory is
getting processed, you give reinforcing stimulus.
o if the next day, the animal does task better..
you improved the memory (food reinforcer enhanced memory, but
isn‟t associated with task).
animal is placed in cage, if animal gets a shock, it stands on the platform (to
not get shocked).
o Group 1 fed in cage immediately after training session
o Group 2 fed in cage hours after the training session
when fed hours later, they still get food, but outside of the
o Group 1 (immediately fed) remained on the platform longer than Group 2
(delay in reinforcer)
o 1) food reinforcer influenced the animals behavior by strengthening the
representation of the contingent relationship between stepping down and
- the food reinforcer is strengthening the memory of the task
(nothing to do with the food)
o 2) the animals learned nothing about the rewarding motivating properties
- the animal isn‟t making any link with the food or task (you would
expect them to step down faster to get food)
“To observe the enhancing function of reinforcers we need to study situations
where the reinforcer is non-contingent upon the response”
o need to study when the reinforcer is not contingent on response.
Reinforcers & Consolidation
A) Electrical stimulation of the brain (BLOCK)
o by passed sensory forms of stimulation (food), and used direct
stimulation of areas of the brain after learning of a task to reinforce it.
reticular formation:involved in general arousal of rest of brain.
(cutting this part from the rest of brain causes coma).
(trained animal on task, stimulated the reticular formation this
afterwards, and found that it strengthened the memory)
o Post-training electrical stimulation of the RETICULAR FORMATION
enhances retention of both appetitive and aversive tasks.
o Self Stimulation Experiments (HUSTON)
-put electrodes in different brain regions to find sites that produce
stimulation. by mistake, discovered a site where the animal will go
back to the location where the stimulation was delivered.
Medial Forebrain Bundle (MFB).
group of axons that project out of the midbrain
When electrodes stimulate the MFB, animals will go
back to places where the stimulation was delivered.
Animal will press the lever
stimulating MFB reinforces from a behavioural point
of view (produces behaviour to return), and also from a memory point of view (promotes memory
received 30 min stimulation of MFB after making a
received no stimulation
group 1 which received stimulation of MFB after
making choice, learned faster than the other group.
Stimulation of the MFB has strong reinforcing properties
B) Drugs of Abuse
o Ex. amphetamine, cocaine, morphine/heroin, nicotine, caffeine, alcohol,
o if you learn something, and during memory consolidation, you use drug
of abuse OR any substance which can release dopamine (sugar, physical
activity, etc), you will find the memory has been enhanced.
-drug has to be given in the critical period of consolidation.
-you need a control which gives drug outside this period.
-must have appropriate dosages, just enough to turn on dopamine
Train on a task, then post training you give a drug.
then later you test there memory
you see memory has been enhanced.
didn‟t even tell them you are retesting them Dopamine:
dopamine cells come from VTG, which has axons projecting to many regions.
the MFB is the bundle of these axons, which projects dopamine to areas.
substantia nigra also produces dopamine, and projects to other areas.
Effect of Reinforcers on Conditioned Motivation
reinforcer is not only changing behaviour through memory consolidation.
o also because the situation around the reinforcer becomes motivating to
the individual (emotional component).
a reinforcing stimulus produces a motivational state which is usually liked.
This state, will be associated to any other stimuli that are present (contextual
stimuli), and these stimuli will become motivationally important (motivationally
salient, incentive value)
o Introducing a reinforcer into a learning situation confers its motivating
power (i.e., motivational salience) on previously non-motivating stimuli o The stimulus acquires secondary reinforcing properties and thus it
becomes a conditioned motivator.
o -testing cigarettes with & without nicotine.
non-nicotine cigarettes people still keep smoking them (not zero)
strong reinforcer is NOT the nicotine, it is the smoke.
can keep people smoking even without nicotine.
-nicotine had been paired enough with nicotine, that it became a
conditioned/motivational reinforcer itself.
(acquires something that you like).
o the nicotine enhances memory & gives motivational value to the
(this is why nicotine is a reinforcer)
QUESTION: stimulation of the MFB (medial forebrain):
a) maintains self-stimulation behaviour
b) feels great
c) is rewarding
d) can also be used to punish behaviour
e) has no affect on memory consolidation
Conditioned Motivation: Facial Reactions (LIKING):
o There is a commonality of facial reactions to like & dislike tastes
-we can infer a measure of 'wanting' from the behaviour.
Liking measure from Facial reactions
Wanting measure from approach behaviour
-a lot of stimuli we consider rewarding is also reinforcing
-study by disassociating liking & wanting
(normal liking, no wanting)
o we know we can disassociate liking and wanting!
addicts don‟t like drugs, but they do them
they want them! (wanting is measure of behaviour)
show a lot of wanting, no liking. Dopamine: Wanting but not Liking.
BERRIDGE & ROBINSON
o Study to distinguish between wanting and liking
- there are dopaminergic neurons in the mid brain which send
dopamine to the striatum. if you inject a drug into the striate
(nucleus accum), the dopaminergic neurons pick it up, transport it,
and then die.
can be sure your making lesions particularly to dopamine
cell, because you can see that only they die.
made 6 hydroxydopamine lesions to the VTA (injected drug into
nucleus accumbens or neostriatum)
only certain neurons will pick it up , transport it to cell
bodied, then kill them.
(specific lesion to dopamine)
Wait until the neurons die and than measure the amount of
dopamine in the area
- 90% depletion and 74.1% depletion
caused an animal to not drink, eat or have sex.
Will move, but approach/goal orientated behaviour goes
away almost completely.
produced severe aphasia (don‟t eat)
don‟t do these things, because they don't approach.
Not because they can‟t do them.
o dopamine is involved in wanting, but NOT liking.
Dopamine does NOT mean pleasure
o extension on experiment:
measure the liking component:
group 1 give animal a sweet (Sucrose) solution
group 2 give animal bitter (Quinine) solution.
see normal liking reactions for both solutions, but see no
wanting (approach) behaviour
liking not effected
Measure aversive reactions in both cases as a control (no
aversive in sucrose)
-React like normal animals in terms of liking behaviour
different regions of the brain affect your liking of something
(emotional value), and effect wanting approach behaviour)
wanting striatum & nucleus accumb. WEEK 7
Effectiveness of Reinforcment
o 1) Drive can effect whether something is a reinforcer.
o 2) Incentive value of S*
(ex. what is reinforcing to a person with a normal eating
style, is not incentive to a vegetarian- cheeseburger)
o 3) Delay of reinforcement
delay between C & US will not effect learning.
o 4) Stimulus Control
o 5) Schedule of Reinforcment
Delay of Reinforcement:
o experiment to look at roll of delayed reinforcement.
study the effect of a short delay between a response & reinforcer,
versus a long delay.
stimulus choice point.
Response going right.
reinforcer (S*) food.
-animals put in T mazes, were reinforced for turning right (food).
-then, confine them in delay box for whatever time.
animal with longer delay, takes longer to learn, should
eventually never learn (to take right hand turn)…
could delay for 20 mins and still learn normally....
-must consider pavlonian conditioning coming in, and filling
the gap of time. animal is using conditioned cues as
reinforcers, which fill the time gap and allow them to still
( rg-rs mechanisims) o rG-sG mechanisms
SG/S* stimulus in goal box
RG reactions in goal box
Stimuli in the start box and delay box come to elicit rG
rG fractional anticipatory goal responses (salivation)
1) energizes behaviour
2) causes sG
3) sG guides behaviour
when in start or delay box, the animal can feel salivation,
and know they are in the right spot ("cheating")
sees delay box, starts to salivate, gets in, salivates,
knows it made the right choice.
sG can also serve as conditioned reinforcers because of their
association with SG/S* (food)
-reinforced to turn right not only by the food, but by a
variety of CS associated, some which are INTEROCEPTIVE
(inside the animal)
-delay boxes are different to the rat
(when animal makes the proper turn, they salivate)
Animals reinforced to turn right so they go to goal box (food)
- Reinforced by receiving the food
-confined the animal in delay box right after they make the
response, before getting to the goal box. -See how long it takes the animal to learn, eventually will
never learn? longer the delay, won't learn to turn right and
get the food?
BUT.. after 20 mins the animal would still go to the food!
Must start to consider pavlovian conditioning seeking into
operant condition. Animal is using pavlovian cues as reinforcers
that fill the gap of time and still allow the animal to learn
Before you get to the stimulus, other stimulus presented
Before you make the right turn, it experience some
responses (Fractional goal responses)
then as it enters the goal box, responses get
stronger and stronger.
- Consequence of having these response:
it is energized - animal knows they are in the right
spot.. salivates at the delay box - knows that they
made the right choice.
-Some of these responses are interoseptive (inside animal)
response guides stimulus, and stimulus guide
behaviour (ex. stomach noise)
Essence: Reason why the animal can bridge gap
during the delay, the animal experiences a variety of
Conditioned Stimuli which reinforcers the behaviour of
turning right. They turn right, feel them turning right. Then
its reinforced by these anticipatory reinforcers in the delay
box, then even more eventually by the reinforcer.
task is to follow dark compartment
even small delays prevent learning, because all CS that lead
animal to food are removed (dark)
eliminate all sources of possible interoceptive cues
- Different delays (zero sec, 0.5 sec, 1.2 sex, 10 sec)
0.5 delay: performace slows down
1.2: slows down even more
10: never learn it.
- Delay of 10 seconds - will NEVER learn Use conditioned reinforcers to gap the delays (as humans)
o we removed all stimulus to tell them that they made the right turn
(black space), there is no predictors to allow for pavlonian conditioning to
occur. Therefore, we see that a delay before getting a reinforcer slows
down learning, and a big enough delay means they will never learn it.
When proprioceptive, as well as exteroceptive, conditioned
reinforcers are eliminated, even a brief delay in the presentation of
the reinforcer prevents learning
behaviour that is reinforced, is usually under control of many stimuli. the first
is one that produces the response, the other are usually contextual.
o -you want to control that stimulus. if you cant, you want to control
contextual stimuli. (what we do)
Behavior that has been reinforced in the presence of one stimulus is controlled
by the presence/absence of that stimulus.
However, responding often generalizes to other stimuli on the basis of their
similarity to the training stimulus.
auto-shaping pigeons peck at 580 wavelength, give food
Change intensity, don‟t give food.
Go back to 580 wavelength, and give food.
learn that 580 wavelength means food.
(is autoshaping because pecking is natural)
(keep reinforcing pigeons for pecking at a particular light colour,
they will peck at the highest intensity. deviate from this colour,
they will peck less)
o Stimulus Generalization Gradient the wide gradient of a variety of stimuli, or narrow gradient of
just 2 particular stimuli.
can train discrimination go from natural wide gradient of
responding (wide), to only respond to one colour (narrow)
the shape of the gradient is affected by learning!
want one particular colour encouraging discrimination.
Reinforce for responding to particular colours
good at discriminating particular wavelengths
o you get a NARROW peak
want generalized btw all colours encourage generalization.
reinforce them for responding at every colour.
o you get a WIDER distribution
o Generalization & Discrimination Training: