Class Notes (1,100,000)
CA (630,000)
McMaster (50,000)
PSYCH (6,000)
PSYCH 1X03 (1,000)
Joe Kim (1,000)
Lecture 6

PSYCH 1X03 Lecture Notes - Lecture 6: Slot Machine, Reinforcement, Mandelbrot Set


Department
Psychology
Course Code
PSYCH 1X03
Professor
Joe Kim
Lecture
6

This preview shows pages 1-2. to view the full 8 pages of the document.
Psychology Lecture 6/7: Instrumental Conditioning
Bernard Ho
October 2, 2010
Instrumental Conditioning
Definition
o Learning the contingency between behaviours and consequences
Thorndike’s Experiment
o Put a cat in a box with a rope attached to the door
o Food was placed outside the box
o Thorndike recorded behaviours and escape times of cat
Hypothesis was that at first, the cat would show random behaviours in figuring how to
escape the box
By accident, the cat would find the rope that opened the door during one trial
After that one trial, the cat would learn the contingency between the rope and the door
Estimated that in the initial trials, the time taken to escape was long, but as more trials
progressed, the cat would take a few seconds to locate the rope and escape
Not exactly what happened
Thorndike found that the frequency of random behaviours gradually decreased over
time
Over several trials, the behaviours that did not lead to escape would occur less
frequently, leaving only the correct target behaviour in place
Suggested that animals follow a simple stimulus-type process with little credit for
consciousness
Cat seemed to work from a long trial-and-error process of discovery
Thorndike hypothesized a process called Stamping In and Stamping Out
o Behaviours like rope pulling were “stamped in” because they lead to rewards
o Behaviours like running around in circles were “stamped out” because they led
to nothing
o Eventually, the general process led to refinement and the cat learned the
contingency between the specific behaviour of rope pulling and the specific
consequence of food reward
Findings would lead to Law of Effect
o Stated that behaviours that produced a satisfying or pleasant state would be
stamped in and produced more frequently
o Behaviours that produced an annoying or unpleasant effect will be stamped out
and performed less frequently
Types of Instrumental Conditioning
Four consequences
o Presentation of a positive reinforcer (reward) following a response is reward
training that increases the behaviour
o Presentation of a negative reinforcer (punishment) following a response is
punishment training that decreases the behaviour

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

o Omission training
Removing a positive reinforcer following a response, which leads to a
decrease in the behaviour being reinforced
Billy is watching TV and teasing his sister
Billy’s mom wants to remove the teasing behaviour, but wants to
avoid punishment and its side effects
She turns off the TV every time he teases his sister
Access to the TV show is a positive reinforcer and removing it will
likely cause Billy to stop his teasing behaviour
o Escape training
Removing a negative reinforcer following a response, which leads to an
increase in behaviour
Floor on side of a cage delivers an electric shock every time it is
touched
It can be avoided if the rat moves to the opposite side of the cage
o Punishment and omission training, although leading to the same decrease in
behaviour, are totally different
Four different types of instrumental conditioning differ in whether a positive or
negative reinforcer is either presented or removed
An important point for any instrumental conditioning is that it proceeds best when the
consequence immediately follows the response
Acquisition and Shaping
In instrumental conditioning, process of acquisition leads to learning contingencies
between a response and its consequences
Psychologists are often interested in measuring the rate of response of the new
behaviour
Autoshaping
o Ex. A pigeon is placed in a special cage with a keyhole
o If the pigeon pecks at the keyhole, a grain of seed is released
o Initially, the pigeon will be unaware of the contingency, but over time the
pigeon will peck the keyhole and learn the contingency between the behaviour
and the consequence
o Can be learned without explicit training by the researched
o Simply placing the pigeon in a cage can cause the pigeon to learn the
contingency
Shaping by successive approximation
o Not all behaviours can be autoshaped
o Some are too complex for a subject to discover on their own
o Complex behaviour can be organized into smaller steps that gradually build up
to the full response we hope to condition
o Each of these steps can be reinforced through reward training
o Over time, the successive approximation leads to the final complex behaviour
o Famous example by behaviourist BF Skinner
You're Reading a Preview

Unlock to view full version

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Trained two pigeons to play table tennis
Pigeons first learned to peck at the ping pong table to receive food
Once established, the pigeons had to peck at a stationary ball, then a
moving ball, then finally peck a ball all the way across a table
As pigeons progressed through these stages, the criteria for rewards
became stricter
Chaining
o Another procedure used to develop a sequence (chain) of responses to build
even more complex behaviours
o In chaining, a response is reinforced with the opportunity to perform the next
response
o Ex. A rat is initially trained to press a lever for a food pellet as the last step in a
chain of responses
o The next challenge for the rat is an overhanging string placed nearby, the rat
must pull the string to gain access to the lever
o The response of pulling the string is reinforced by the opportunity to make the
original lever press response that leads to food
o Step by step, a chain of responses can be built leading to a final sequence of
behaviours that can appear to be quite complex
Generalization and Discrimination
Discriminative Stimulus
o Signals when a contingency between a particular response and reinforcement is
“on”
o Ex. Whenever a child eats his vegetables in his parents’ house, he is rewarded
with dessert
o Response (eating vegetables) reinforcement (dessert)
o At his parents’ house, there is a discriminative stimulus (SD) or (S+)
SD Response Reinforcement
o The environment of his parents’ home becomes an SD for the response of
vegetable eating behaviour, which is rewarded with access to dessert
o SD contingency between response and reinforcement is on
o However, the child may eat his vegetables at his grandparents’ house and be
surprised that he is not rewarded with dessert
This is the Sδ
o Sδ is a cue that indicates when the contingent relationship between response
and reinforcement is not valid
o The environment of the grandparents’ home becomes an Sδ for the response of
vegetable eating
o The child learns that under these conditions, eating vegetables will not lead to a
dessert reward
o This behaviour can also be capture on a Generalization Gradient
Recall that not only does a CR happen when a CS is present, it can also
happen with similar CS
o The child may also expect dessert in environments similar to his parents’ house
You're Reading a Preview

Unlock to view full version