Instrumental Conditioning I
Involves explicit training between voluntary behaviours and their consequences.
The learning of a contingency between behaviour and consequence.
If you touch a hot stove you’re going to get burned.
A specific behaviour leads to a specific consequence.
Instrumental conditioning is all about learning the contingency between
behaviours and consequences.
Edward L. Thorndike
Began his investigation by studying cats in a puzzle box.
This put the focus on overt behaviour rather than on mental elements or
Thorndike’s Experiment: He measured the time it took the cat to learn to open
the door by pulling the string.
Thorndike predicted that on trials following the discovery of the correct solution,
the cat would escape immediately once placed in the same puzzle box.
A pattern of behaviour following this hypothesis would look like this:
Long escape times during initial trials would be followed by a dramatic step
down in time to escape in later trials.
It sounded great, but this isn’t what happened. Instead, Thorndike found that the frequency of the random behaviours gradually
decreased over time.
Over several trials, the random behaviours that did not lead to escape would
occur less frequently, leaving only the correct target behaviour in place.
This suggested that animals followed a simple stimulus-response type process
with little credit for consciousness.
The graph indicates a decreasing number of behaviours in relation to the number
of increasing successful trials.
There was never a distinct “aha” moment.
The Law of Effect
Thorndike hypothesized a process called Stamping In and Stamping Out, which
determined whether a behaviour was maintained or eliminated.
Behaviours like rope pulling were stamped in because they were followed by the
favourable consequence of access to food.
In contrast, random behaviours like turning in a circle, were stamped out.
Eventually, this general process leads to refinement and the cat learns the
contingency between the specific behaviour of rope pulling and the specific
consequence of food reward.
These finding lead to the GENERAL LAW OF EFFECT: Behaviours that produce a
satisfying or pleasant state will be stamped in and performed more frequently;
behaviours that produce an annoying or unpleasant affect will be stamped out
and performed less frequently.
Types of Instrumental Conditioning
A more precise strategy is to refer to the reinforcer, which is any stimulus, which,
when presented after a response, leads to a change in the rate of that response.
Both positive and negative reinforcers, each of which can be presented or
removed, change behavioural responses. This leads to 4 different types of instrumental conditioning:
Presentation of a positive reinforce following a response increases the
If you present your puppy with a treat every time it sits on command, the
behaviour is likely to increase.
Presentation of a negative reinforcer.
Leads to a decrease in the behaviour being reinforced.
If every time you placed coins into a pop can machine you were shocked, you will
very quickly decrease the behaviour.
The use of punishment must consider the ethics of experiencing fear or pain in
Many learning theorists believe that when punishment is used, the authority
figure may, through classical conditioning, become a signal for pain or distress, a
contingency that may ultimately damage a parent-child relationship.
Involves removing a positive reinforcer following a response which leads to a
decrease in the behaviour being reinforced.
This is clear because removing a positive reinforcer is a situation that a person
wants to avoid.
A version of the omission training used in schools or by parents is known as the
time out procedure.
Removal of a positive reinforcer ≠ Presentation of a negative reinforcer
Occurs when a response is followed by the removal of a negative reinforcer. There is a constant negative reinforcer being presented that the learner is
motivated to have removed.
By performing a specific response, the negative reinforcer can be removed,
which leads to an increase in the target behaviour.
The four different types of instrumental conditioning differ in whether a positive
or negative reinforcer is either presented or removed.
It proceeds best when the consequence immediately follows the response.
Acquisition and Shaping
The process of acquisition leads to learning the contingency between a response
and its consequences.
Psychologists are often interested in measuring the rate of responding of the
Here is the output for a typical experiment:
When behaviours can be learned without explicit training guided by the
The complex behaviour can be organized into smaller steps, which gradually
build up to the full response we hope to condition.
Each of these steps can be reinforced through reward training.
Over time, the successive approximations lead to the final complex behaviour.
Ex. Pigeons with the ping pong ball. Instrumental Conditioning II
Generalization and Discrimination
The Discriminative Stimulus
It’s not only important to learn the contingency between a response and
reinforcement, but also when that contingency is valid.
A discriminative stimulus signals when a contingency between a particular
response and reinforcement is “on.”
Contrast the SD, with the notion of an S-delta.
The S-delta is a cue which indicates when the contingent relationship is not valid.
o SD = present of parents
o Response = politeness
o Reinforcement = praise
o *Generalize she might also be polite in the presence of other adults in
the hope of receiving praise and attention.
o However, her polite behaviours may not be quite as strong as they would
be when her pa