Learning and Behaviour
Your textbook defines learning as "an adaptive process in which the tendency to perform a
particular behaviour is changed by experience" (p. 196). Here is another definition: “A more
or less permanent change in behaviour or behavioural potential that results from
experience”. It is quite easy to think of examples of learning. Most of us learned how to ride
a bike. We're currently learning introductory psychology material. Let’s take the second
definition of learning and break it down into its parts, to see how it might apply to these
o More or less permanent - we sometimes forget things or lose the ability to do
things. What we need to do is to distinguish changes that occur for other reasons.
For instance, as you get tired, your performance on certain tasks declines (you
become fatigued). This is a change, but is not permanent. Adaptation, which you
learned about in Lesson 8, and Learning Labs 9 and 10, is another example of a
short-term, temporary change.
o Permanent changes are just that -- once you've learned how to ride a bike, you never
forget (or so they say!). When you learn to speak Swedish as a second language, you
might forget it if you don’t use it, but if you were to learn Swedish a second time you
would find it much easier and faster than the first time, indicating some
preservation of learning. Hopefully, your ability to learn introductory psychology
material is more or less permanent.
o Change in behaviour - learning to ride a bike is an example of a direct change in
o Or behavioural potential - sometimes we learn things from observing others. Our
behaviour is not directly changed as we are learning, but later is shown to be
changed. For example, you might learn how to cook from watching a chef on TV, and
use what you have learned later, but it is rare that you actually are cooking while the
chef is cooking.
o Resulting from experience - we need to distinguish between changes that occur for
reasons other than experience. For instance, many changes in behaviour occur
because of maturation. For example, newborn babies exhibit a behaviour called the
"rooting reflex". If you brush a newborn's cheek it will turn in that direction. After a
few weeks, this behaviour disappears. This change in behaviour is entirely
maturational, and occurs regardless of a baby's experience.
The Origins of Learning Theory
You have already encountered the "nature-nurture" debate, which was rooted in
renaissance philosophies that attempted to describe where human behaviour came from (p.
10-12). There were two camps, Nativists and Empiricists.
Nativists - Descartes (16th century) proposed that almost all behaviour was
reflexive or due to inborn ideas. He, and the other nativists, suggested that we are born the
way we are and our life experiences play little or no role in shaping our behaviour.
Empiricists - Early empiricists, like Thomas Hobbes and John Locke, proposed that humans
are born without ideas or knowledge of behaviour; they learn through experience. Locke
famously likened the infant’s mind to a “Tabula Rasa”, or blank slate; emphasizing the belief
that no knowledge was inborn; all knowledge is acquired through lived experience.
Empiricist philosophers believed that learning results from repeated pairings of
experiences. However, the renaissance philosophers did not systematically explore these
Pavlov and the Dog Even though this is your first psychology course, you’ve probably heard of Ivan Pavlov
ringing a bell that caused a dog to salivate (introduced in your text on p. 21). In fact, Pavlov
was not a psychologist – he was a physiologist interested in digestive reflexes (Fig 7.3, p.
Pavlov was interested in how dogs salivate when presented with food. One day, he noticed
that his dogs would often begin salivating when they saw the experimenter that usually
gave the dog food, even if he had not yet given them any. Pavlov reasoned that the dogs
must have been learning an association between the experimenter and the presentation of
food, that caused the dogs to exhibit a behaviour (a response) that looks a lot like the
behaviour associated with food. He thus began systematically studying this type of learning,
which became known as classical conditioning (or Pavlovian conditioning).
Classical Conditioning – One Thing Leads to Another...
Classical conditioning is a form of learning: the animal learns the association between
two stimuli. In Pavlov's experiments, the operational definition of learning was how much a
dog drooled. Pavlov demonstrated that repeated experience with two stimuli (an auditory
stimulus, like a buzzer, and food) at the same (or nearly the same) time, caused these two
stimuli to be linked somehow in the dog’s nervous system. A response that normally occurs
reflexively to the biologically relevant stimulus (i.e., salivating when presented with food)
could also be elicited by a stimulus that is not biologically relevant (i.e., salivating when
hearing the buzzer). Pavlov used the following terms to describe the various components of
Unconditioned Stimulus (UCS)—the stimulus that evokes the behavioural response of
interest. This stimulus elicits a behaviour prior to any learning. In Pavlov's experiments, this
was the food.
o Unconditioned Response (UCR)—the reflexive response to the presentation of the
unconditioned stimulus. In Pavlov's experiments, this was the dog's salivation to the
o Conditioned Stimulus (CS)—Initially it evoked no response, but, after conditioning,
it now evokes a response (salivation). In Pavlov's experiment, the CS is the buzzer.
o Conditioned Response (CR)—similar (but often not identical) to the unconditioned
is evoked by the conditioned
In Pavlov's experiment, the CR is the
salivation to the buzzer
train an organism to
conditioning, you have to
repeatedly pair a
stimulus (one that doesn’t
elicit the response of interest)
stimulus. In Pavlov’s
experiments, that involves pairing the buzzer with
the presentation of food. Initially, the animal will exhibit an unconditioned response
(salivation) to the food. After a number of trials, you take away the unconditioned stimulus
and present only the neutral stimulus (buzzer). If the animal demonstrates a response
similar to the unconditioned response (salivation), now only to the neutral stimulus, then
you have "conditioned" the animal, and the neutral stimulus is now the conditioned
Factors in Classical Conditioning
Acquisition—Using the model of
learning described above, Pavlov
suggested that there were five
general factors in any conditioned
response. The first of these is the
learning of the new response, or
acquisition. According to Pavlov, acquisition in classical conditioning is gradual. It requires many trials (often around 20) in which
the CS and UCS are paired before conditioning can be demonstrated. The relationship in time
between the neutral stimulus (the eventual CS) and the UCS matters a lot:
Acquisition works best if you present the CS just before you present the UCS and then
continue presenting it until the UCS is taken away. So, start the buzzer, then present the
food, turn off the buzzer and take away the meat. This type of conditioning is referred to as
delay conditioning - there is a slight delay between the onset of the CS and the onset of the
Changing the order of events around, so that the food is presented before the buzzer turns
on (UCS before the CS) is called backward conditioning. This is the least effective form of
conditioning. In fact, there are probably only a handful of examples in which it has been
shown to be effective at all.
One can present the CS and UCS at exactly the same time - buzzer and food are presented
simultaneously. This is referred to as simultaneous conditioning. Although more effective
than backward conditioning, it requires more trials to demonstrate simultaneous
conditioning than delay conditioning.
Finally, there is a form of conditioning in which one can present the CS, take it away and
then present the UCS (buzzer goes on, is turned off and then food is presented). This is
called trace conditioning. The term comes from the idea that the animal is learning an
association between a memory trace (of the buzzer) and the presentation of the food. Trace
conditioning is generally only effective if there is a very short delay between when the CS
goes off and the presentation of the UCS. If you wait longer than a few seconds, trace
conditioning generally does not work.
Generalization – Would your dog be unable to anticipate being fed if you bought a different
kind of food? The conditioned cues for food would be slightly different: It would still be you
feeding her; the can opener, the dog dish and the time of day would be the same, but the can
is different. Would she exhibit the conditioned response (tail wagging; whining; general
eagerness and probably salivation)? Probably. This demonstrates generalization – the CS
doesn’t have to always be identical; similar CSs can also elicit the CR. For instance, you can
train a dog to salivate to a buzzer with a medium pitch (1000 Hz tone). The 1000 Hz tone is
the CS. If you lower the buzzer’s pitch slightly (e.g., to 900 Hz), the dog will still salivate so
the response has generalized. But the degree of salivation (i.e., generalization) depends on
the degree of similarity between the original CS and your test stimulus. For example, if your
tone has a very low pitch (e.g., 200 Hz tone), the dog will still salivate, but much less so. So
the degree to which the animal exhibits the conditioned response tells you how similar the
animal perceives the test stimulus and the original CS to be – this can be a useful method for
determining how animals perceive the world.
Discrimination—Once a conditioned response is learned, one can further train the animal
to become more specific in its responses. For instance, the dog that was trained to a 1000
Hz tone but also salivates to a 900 Hz tone (i.e, shows generalization) can undergo further
training, discrimination training, which will reduce its responses to tones other than the
original CS. In discrimination training, you present several different tones, but only one tone
is paired with the unconditioned stimulus. So, every time a 1000 Hz tone is played, the dog
is presented with food. When you play a 900 Hz (or any other frequency), no food is
presented. Following this discrimination training, the dog will no longer salivate to 900 Hz
tones (or 1100 Hz tones). In fact, it will only salivate to tones that it cannot perceptually
distinguish from 1000 Hz. This type of training can be very useful if you are interested in
determining the perceptual capacities of non-verbal animals. Pavlov viewed discrimination
training as a very important element in classical conditioning – learning associations
between features in the environment and the lack of unconditioned stimuli – i.e., learning
they are irrelevant - is just as crucial as learning that other features predict unconditioned stimuli. This associative learning about the unimportance of stimuli is very similar to
habituation, which serves a very similar function but is nonassociative in nature.
Habituation is one of the simplest forms of learning, found in many types of animals, and
involves diminished responding to a stimulus with repeated exposure. It is described in
your text on p. 197. (In turn, habituation might remind you of adaptation. Remember that
adaptation is quick and is due to fatigue in neurons. In contrast, habituation is a much
slower process, and fits the criteria described above for learning.)
Contingency vs Contiguity
We’ve already learned that classical conditioning requires that the CS and US occur close
together in time (contiguity). But is contiguity a necessary and sufficient condition for the
acquisition of a learned association? If contiguity is a necessary condition, then it must be
present for learning to happen (without contiguity, there is no learning). If contiguity is a
sufficient condition, then nothing else is required (if you have contiguity between two
stimuli, they will become associated).
Pavlov recognized the necessity of contiguity, and most if not all learning theorists would
still agree that it is a necessary condition. But is it sufficient?
You have learned that conditioning is most effective when the CS acts as a cue, informing the
animal about the imminent presentation of the US. The degree to which the CS predicts the
presentation of the US is called contingency. Contiguity doesn’t always lead to contingency.
Having two stimuli occur close together in time does not always mean that the first stimulus
predicts the occurrence of the second.
This was elegantly demonstrated by Robert Rescorla in a classic experiment (Rescorla,
1966). Rescorla used two enclosures with a low barrier between them. Dogs were trained to
jump the barrier to escape electric shock. They jumped back and forth between enclosures
as the shock was presented in one enclosure and then, after some time, in the other. Dogs
learned to jump back and forth pretty easily. Then, the dogs were divided into two groups.
In the Contingency group, a tone always sounded before the shock was presented, and
didn’t sound if shock was not imminent. The CS had good predictive value (the CS-US
contingency was high) and the animals quickly learned not to wait for the shock itself, but to
jump when they heard the tone. In the other group of dogs, the Random group, there was no
CS-US contingency. Half the time the CS sounded, it was followed by shock, but half the time
it wasn’t. Half the time there was a shock, it was preceded by the CS, but half the time it was
not. As far as the dog was concerned, shock was just as likely to occur in the absence of the
CS as it was to occur after a CS. The CS did not have any predictive value. The dogs in this
group did not learn to use the tone as a cue to avoid shock. This demonstrates that
continguity and contingency are both necessary for conditioning to occur.
Extinction—After you have conditioned a response, if you start presenting the CS
repeatedly without the accompanying UCS, the CR eventually starts to diminish. For
example, a dog salivates to a buzzer because it has been associated with food. Next you
sound the buzzer a number of times without the food. The buzzer is no longer predicting the
presence of food, and so gradually, the dog’s salivation to the buzzer will decrease until it is
gone. This is called extinction.
Spontaneous Recovery—Let's say you've extinguished salivation in a dog. You put the dog
back in its cage, come back the next day and then present the buzzer. Given that the CR
(salivation to the buzzer) was eliminated yesterday, the dog shouldn't salivate today. But,
the dog does salivate. This is referred to as spontaneous recovery, which occurs to a lesser
extent over a number of extinction sessions. Spontaneous recovery is important because it
shows that the animal has not lost the response, but rather the response has been inhibited.
In addition, if a conditioned response has been acquired and then extinguished, the animal
is quicker to re-acquire the conditioned response when the CS and US are presented
together again (kind of like re-learning Swedish!) The Beginnings of Behaviourism – If We Can’t See It, We Can’t Study It
At the beginning of the 1900s, psychologists in the U.S. began to reject the method of
introspection (p. 18 of your text) in their attempts to understand and explain human
behaviour. John Watson (p. 20-21 of your text) argued that for psychology to become a true
science (i.e., to use the scientific method) psychologists had to measure actual, observable
behaviour and to manipulate environmental factors that would change that behaviour. He
rejected the use of mentalistic concepts such as thoughts and emotions by psychologists in
explaining behaviour. He argued the psychologists’ job was to determine the relationship
between observable responses to observable stimuli in the environment. Watson was
impressed with Pavlov’s demonstration of classical conditioning and applied Pavlov’s
experimental method to study the origins of fear in humans.
Watson conducted a famous study in which he conditioned fear in an 8-9 month- old child
called “Little Albert”. Watson presented Albert with a rat (originally neutral stimulus—the
CS), which Albert was allowed to play with, and he showed no fear of it. Then, whenever
Albert would reach out and touch the rat, a loud sound was made right behind Albert (the
UCS) which caused him to cry and show fear (the UCR). After several pairings, Albert
showed distress and cried whenever the rat appeared in the room (the CR). He also cried
when presented with other animals such as a rabbit and a monkey (showing stimulus
generalization). Sadly, the story of Little Albert ends in tragedy, as the experiment was
ended before Watson was able to extinguish the conditioned fear response. Recently, a
group of researchers uncovered the identity of Little Albert (a pseudonym), and their
historical analysis revealed that the child, whose real name was Douglas Merritte, died of
hydrocephalus – commonly known as “water on the brain” - at age 6 (Beck et al, 2009).
Real-Life Examples of Classical Conditioning:
Example 1--Pizza Ads on TV—So, we now have a pretty good idea as to how to get a dog to
drool and a baby to cry. Does this really apply to anything else? Here are some examples of
classical conditioning in real life:
You've all seen those ads on TV where happy people are
eating yummy-looking pizza. Have you noticed how the company logo keeps coming up in
the middle of the scenes of pizza gorging? As well, the song/slogan is repeated over and
over. Ads like this can be interpreted as examples of classical conditioning. The CS is the
logo (or song/slogan), paired repeatedly with scenes of people eating. The scenes of people
eating act as a UCS, and cause us to feel hungry (UCR). Pairing the logo (CS) with people
eating (UCS) eventually produces a feeling of hunger when we see the logo alone (i.e., it
becomes a CS).
Example 2--Allergic Reactions to Fake Flowers—An allergic reaction is an inappropriate
and excessive immune system response. When we have a cold or flu, our immune system
generates histamines that cause us to sneeze. Many people show similar immune responses
to things that normally wouldn’t be expected to bother them, like the pollen of flowers.
Sneezing to pollen is an allergic reaction. Some researchers at the University of Winnipeg
asked for volunteers who had known allergies to flowers. These people sneezed when
presented with a flower, as expected. However, these people sometimes also sneezed when
they were presented with a silk flower, which obviously does not have any pollen. A few
even sneezed when presented with a picture of a flower. An allergic reaction is really to
pollen and not the flower, per se. However, the flower is often associated with the pollen. So,
the flower itself becomes a CS; it generates a CR (sneezing) even if the UCS (the pollen) is
not present. The fact that some sneezed to a picture represents an instance of
generalization. Your textbook discusses how classical conditioning can determine emotional
responses and phobias (p. 203 of your textbook). Another application of classical conditioning to note is the work of Dr. Shep Siegel from McMaster University who led
investigations on classically conditioned drug effects (described on p. 224 of your textbook).
This leads to a final, tragic, real-world example of classical conditioning.
Example 3 – Accidental Death from Drug Overdose
All too frequently, police are confronted with the sudden death of a drug addict from a drug
overdose. Often the addict is an experienced user and the dose taken, although high and
potentially fatal for an inexperienced user, is not larger than the dose they usually take.
Often, the death occurs in an unusual setting. Classical conditioning may play a role in such
deaths. The features of a familiar environment, in which drug taking is habitual, will become
associated with the UCS of the drug, and the UCR of the drug effects on the body.
Accordingly they will serve as CSs for a CR, which instead of resembling the UCR, might
involve compensatory reactions. For example, heroin slows down a person’s breathing rate,
and the CR in an experienced heroin user involves breathing that is faster than normal, in
order to maintain homeostasis. If the CSs are not present, because the addict is staying in an
unfamiliar place, doesn’t have their usual drug taking paraphernalia or is not with their
usual companions, then the CR does not occur and the usual dose of drug becomes a fatal
overdose. These compensatory CRs also explain, in part, why drug users have to take
increasingly large doses of drug to achieve the same effect – the phenomenon of drug
tolerance. Finally, because it takes a while to extinguish the conditioned response, allowing
recovering addicts to spend time in familiar environments might lead to increased
withdrawal symptoms and craving, much like the pizza example above. The features of the
familiar environment elicit a CR which prepares your body for the drug – which doesn’t
come, like a hunger that is not satisfied. A way to treat this is to, in a controlled fashion,
repeatedly expose recovering addicts to the CSs (their familiar environment, drug-taking
paraphernalia, and companions) but in the absence of drug, in order to promote extinction,
so that the cues are no longer effective at eliciting a CR.
Operant (Instrumental) Conditioning – The Consequences of Behaviour
Pavlov developed a general model of learning involving reflexes. This model became very
popular in European psychology circles. In the United States, at about the same time that
Pavlov was doing his work on conditioning, Edward Thorndike (first introduced on p. 16 of
your text) was putting hungry cats into boxes in order to study a type of learning that was
not related to reflexes.
Thorndike wanted to see how a cat got out of a puzzle box to obtain food (reward). A picture
of one of the puzzle boxes Thorndike used can be seen on p. 206 of your text. A yummy dish
of minced fish was placed in front of the box, in view of the hungry cat. The cat had to
perform a particular action to open the door to the box to get out (e.g. pulling down on a
ring which opens the latch to a door). On the first few trials the cats usually exhibited
mostly random behaviours. They might meow, scratch the door, sniff, and/or move about in
an agitated fashion. Eventually they would pull the ring by chance, and the door would
open. With repeated testing, Thorndike’s cats became more efficient and their time to
escape became shorter. In this case, then, learning is defined operationally as time to escape
the puzzle box. The time to escape became shorter, but the learning curve was gradual and
haphazard (see Fig. 7.10 on p. 206 of your text). Based on these observations, Thorndike
proposed a general law of learning that he termed the Law of Effect.
Law of Effect
Thorndike's Law of Effect starts with the assumption that when an animal encounters a new
environment, it initially produces largely random behaviours (e.g., sniffing, scratching,
wandering about etc.). Eventually the animal (here, a cat) begins to pair (associate) some of
these behaviours with satisfying things (access to food). The Law of Effect is that behaviours
with favourable outcomes are likely to be repeated; these behaviours become "stamped in" and are more likely to occur again. Other behaviours that have either no useful
consequences or uncomfortable effects are said to be "stamped out" and are less likely to
occur again. Thorndike called this type of learning instrumental learning. He suggested
that the animal acted in ways that were instrumental to obtain satisfaction.
B.F. Skinner and Operant Conditioning
One of the most prominent theorists who studied this type of learning was
psychologist, Burrhus Frederic (B.F.) Skinner. Early in his career, Skinner proposed that
instrumental learning and classical conditioning were two very different processes.
Whereas classical conditioning involved a biologically significant event (UCS - food)
associated with a neutral stimulus (CS - a buzzer), instrumental learning consisted of a
biologically significant event following a response (not a stimulus). Skinner called this
response an operant. He
renamed instrumental learning operant conditioning. These
days, we use the
two terms interchangeably.
How Operant Responses Differ from Reflexes
Unlike a reflex, an operant (response) can be accomplished in several ways. Consider, for
example, the difference between a dog salivating in Pavlov's experiment, and a cat pressing
a paddle in Thorndike's experiment. A dog can drool in only a very limited number of ways.
In contrast, a cat can pull a ring in an unlimited number of ways -- It could really pull with
its left or right front paw, or just bat it hard, or pull with its teeth, or climb on it... well you
get the idea.
According to Skinner, the actual physical response involved in the act of pulling a ring (or
whatever the operant response is) was not the important thing. Rather, it was the
consequences of the response. In the case of Thorndike's cat, the consequence of pulling
the ring was gaining access to food.
Operant Behaviour is Determined by its Consequences
Skinner also decided that Thorndike's law of effect was a "circular" explanation and,
therefore, did not really explain how this learning occurred at all. Skinner's reasoning went
as follows: A cat presses a paddle to get fish. Why? Well, being able to eat fish is satisfying to
the cat. How do we know that fish satisfies that cat? Because the cat presses the lever to get
fish. Rather than rely on what Skinner saw as an untestable notion of "satisfaction", Skinner
proposed a simpler explanation of this type of learning. He stated that operant behaviour is
determined by its consequences. He replaced the term “satisfaction” with the term
reinforcer—referring to the stimulus change that occurs after a response that increases the
frequency of that response.
Consequences of Behaviour
According to Skinner, there are 4 possible ways a behaviour can be changed as a result of its
consequences (see Carlson & Heth, p. 208). Behaviour change can result from
reinforcement or punishment. If a behaviour is reinforced, the probability of that
behaviour occurring again is increased, whereas if it is punished, the probability of that
behaviour occurring again is decreased.
Two types of reinforcement:
Positive Reinforcement -- A positive reinforcer is something that, when PRESENTED after
a behaviour, increases the probability of that behaviour (e.g., access to a fish is a positive
reinforcer for the cat’s paddle-pressing behaviour). Note, one does not have to assume that
this leads to satisfaction. One merely has to observe that the behaviour increases in order to
conclude that the organism has received a positive reinforcer. You probably have
encountered many examples of positive reinforcement. If you say to yourself: "if I just read the next chapter of my psych textbook, I will allow myself to watch a TV show", then you are
using positive reinforcement (TV watching) in an attempt to increase your reading
behaviour. The classic example of positive reinforcement comes from Skinner's studies
using a device he called an "operant chamber" (which became known as a "Skinner Box").
Figure 7.11 on p. 207 shows such a device. A hungry rat is placed in the cage that typically
has a lever and a tube through which a food pellet can be delivered. In a positive
reinforcement situation, if the rat presses the lever, a food pellet is given. Increased lever-
pressing behaviour indicates that the food pellet is a positive reinforcer.
Negative Reinforcement -- A negative
reinforcer is something that, when
increases the probability of a
behaviour. In a Skinner box, you might set
up a situation in
which the rat receives a low-
level shock until it presses the lever. If lever
increases, then this is an indication
that the removal of the shock is acting as a
reinforcer. The idea is that any
behaviour that results in the removal of
negative will increase in
probability (i.e., acts as a reinforcer).
Conditioning using negative
sometimes called escape conditioning or
avoidance conditioning. How
does the Dairy Queen picture reflect negative reinforcement?
There are also two types of punishment:
Positive Punishment--A positive punisher is something that, when PRESENTED following
a behaviour, decreases the probability of that behaviour. For example, applying a shock
every time a rat presses the lever in a Skinner box leads to a decline in lever-pressing
Negative Punishment—A negative punisher is something that, when REMOVED following
a response, decreases the probability of that behaviour. The classic parenting tool, the
‘time out’, is an example of negative punishment. During the time out, the child isn’t allowed
to do the fun things he or she enjoys and would normally be doing.
See figure 7.13 on page 210 of Carlson & Heth for a summary of these four terms.
Factors in Operant Conditioning - Shaping
Often, a target operant response is quite elaborate and is not in an animal’s natural
How can you train animals to do this? Well, you use the principle that, if a behaviour is
reinforced, the probability of that behaviour occurring again is increased, whereas if it is
punished, the probability of that behaviour occurring again is decreased. You reinforce
behaviours that are successively closer and closer to the desired operant response. This
type of acquisition is referred to as shaping, described on p. 210-211 of your text. Skinner
used this term because he likened the process to shaping a lump of clay into a piece of art.
For example, when a rat is first put in a Skinner box, it likely will not rush to the lever and
start pressing. An experimenter typically begins training by reinforcing the animal for
approaching the lever, then for contacting the lever, then for pushing on the lever, and so
on. Shaping, then, is the act of reinforcing behaviours that are increasingly similar to desired
Chaining is similar to shaping – it simply refers to the idea that animals can be trained to
produce many different operant responses one after another, like links in a chain. In