Intro to Cognitive Psych – Lecture 1
Scientists have a tendency to begin courses with the history of their discipline. This is understandable-we want to
show you where it all came from. Still, in some ways, history makes more sense when you know something about a
field already, so maybe it should be discussed at the end. Or maybe both. I've decided to go the standard way,
starting with a brief history to set the context for what follows. This decision is based on the old maxim-"what better
place to start than the beginning?" Also, this brief history will provide an introduction to many of the topics we will
be discussing in this course. I promise not to go on at great lengths, despite the many fascinating individuals and
ideas in the history of thinking about the operation of mind.
Representation and Process
Before heading back in time, there are a few terms and ideas we need to consider. It seems to me that there is one
central theme that runs through the history of memory and cognition from earliest philosophy to current psychology.
This concerns what I will call the two critical strands of cognition:
When you ask what someone knows, you are asking about the representational component of cognition.
Representation: the knowledge we possess, or the information that is in our memory.
Questions like this also lead to questions about the structure or architecture of the cognitive system. Over the years,
people have argued for two types of representational structure:
1. static: virtually never changing
2. dynamic: always changing
The earlier view was the static one, which seems quite intuitive. As one illustration, this provided the basis for the
idea of superior hypnotic recall (i.e., recalling information under hypnosis that could not be recalled in one's normal
state), because the original information had to be in memory. After all, you rarely see other structures spontaneously
change (e.g., buildings), so why should the contents of memory do so? Or, better yet, how?
Today, we prefer the dynamic view of cognition - that the system is constantly changing. Although a representation
can be dynamic in its own right, the primary way this dynamic quality is imparted to the system is through process.
Process: an operation on an external stimulus or on an internal representation.
A process can create a new memory representation or make use of an existing one. Of course, a process can also
update existing representations or reinterpret them, leading to the ever changing quality of cognition. Otherwise,
how would you learn your new phone number, or forget your old one?
We must be able to reorganize information and ideas, based on the different tasks we are faced with. This active
view of cognition is the very heart of today's cognitive psychology. Of course, the constant interaction of
representation and process means that it is virtually impossible to untangle these two threads of cognition, yet it may
help us to try to think of them as separate from time to time. Indeed, Paul Kolers argued that what is kept in memory
following an experience is not a record of the details of that experience but rather a record of the processes used
during that experience - in his view, the processes are the structure! I'll have more to say about that later in the
Definition of Cognition
Well, perhaps it is time to try to define cognition. Webster's says it is derived from the Greek gnosco, meaning "to
know"-the "co" part comes from "con", meaning "with", and defines it as-Knowledge from personal view or
experience; perception; a thing known. Clearly, this emphasizes representation. However, the word may also derive from the Latin cogito, meaning "I
think", which adds process to representation.
Let us try a more psychological definition. William James (1890) defined psychology as "the science of mental life,
both of its phenomena and their conditions." James' definition turned out be far too narrow a definition of
psychology, but does describe cognitive psychology. Ulric Neisser (1967; Cognitive Psychology) wrote in the first
text devoted to the study of cognitive psychology: "Cognition refers to all processes by which the sensory input is
transformed, reduced, elaborated, stored, recovered, and used". Note that Neisser's definition very much reflects the
process view of cognition. We can adopt a definition of cognitive psychology that tries to capture the ideas of James
and Neisser and current researchers in the field.
Cognitive psychology is the study of skills and knowledge-how they are acquired, stored, transformed, and used.
The term "cognition" emphasizes the symbolic, mental, and inferred processes of mind. Cognition refers to the set of
processes and representations involved in such activities as those listed on your syllabus-learning, remembering,
thinking, reasoning, communicating, deciding, and so on.
One thing that you should be aware of right away is that this is a very difficult field because the very thing we are
trying to study is the one thing we cannot observe directly. You can't see these activities; you can only see their
consequences. I can't see you remembering, for example, I can only tell from your behaviour (sometimes!) whether
you remember or not. This is the classic learning/performance distinction, perhaps better called the
cognition/performance distinction. In studying cognition, you can only study it indirectly, and then base your
inferences on those indirect data.
Consider the following two examples:
1. After a lovely romantic dinner, your partner asks you if you remember your first kiss (with him or her!). If you
do, you're safe and you say YES and can even describe it. If not, what do you say? Of course, you say, with a
dreamy look on your face, YES, reach for his or her hand, and then try to shift the subject. Performance suggests the
presence of knowledge, but the knowledge isn't there.
2. On the other side of the coin, your partner asks you if you remember agreeing to make dinner tonight. If you do,
and still feel like it, you can say YES. If you do, but no longer feel like doing it, you may try to get out of it by
saying NO. [I recommend suggesting dinner out.] Here, performance suggests the absence of knowledge, but the
knowledge really is there.
The point of these two examples is to demonstrate that performance and cognition do not always match, and this is
true not just in such contrived examples. For example, anyone who plays a sport (e.g., tennis or squash) can recall
excruciating instances where a shot they know how to make simply doesn't work out. The knowledge is there but the
This is a problem we will be dealing with throughout the course. In a very real sense, more than we care to accept,
we do not know what is in our own minds, only what we think is there. But British philosopher Immanuel Kant in
the 1700s had an idea that is very relevant to all science, and certainly to cognitive psychology. He called it the
"transcendental method" and essentially it involved working backward from observable effects to infer their most
Now let's get back to the problem at hand, and briefly consider the history of the study of cognitive psychology.
The study of cognition begins with the Greek philosophers. Prior to Socrates, the emphasis was on perception and
perceptual processes, and memory and cognition were seen as outgrowths of perception. Memories were simply
stored literal perceptions or traces, and often there was a kind of primitive physiological basis in trying to locate
these traces. To understand mind, it was thought that physics was the critical science because the mind was just a
copy of the physical world. The first to actually deal with such problems as how information from the different senses was integrated was
Diogenes of Apollonia who conceived of the cognitive system as the integrator, and gave us the original meaning of
the term "common sense." However, he was still a bit off track. Of the primitive elements the Greeks recognized, he
chose air as the most likely candidate for the basic element of cognition based on his observation that it was the only
substance that regularly went in and out of the body.
But then along came Plato who sharply upgraded the thought on cognition. While still emphasizing the individual
senses, his avowed goal was to discover the object of mind-in our terms, the representational structure of cognition.
His key idea was that of universals underlying perception of particulars (e.g., "dog" for "Lassie"). This is still a
critical idea in modern cognitive theory, as we will see near the end of the course when we discuss concepts and
However, Plato gave us a static view of cognition in his "wax tablet" analogy of memory. It is a mark of his impact
how persistent this view has been. Consider the following quotes:
"Imagine, then, for the sake of argument, that our minds contain a block of wax, which in this or that individual may
be larger or smaller, and composed of wax that is comparatively pure or muddy, and harder in some, softer in others,
and sometimes of just the right consistency." (Plato, approx 380 BC)
"Some minds are like wax under a seal-no impression, however disconnected with others, is wiped out. Others, like
a jelly, vibrate to every touch but under usual conditions retain no permanent mark." (James, 1890, p. 293)
Although Plato soon backed off from this view, because it created problems for his developing theory of universals,
it is still with us today in popular ideas.
But it was Aristotle (around 400 B.C.) who had the major influence and whose ideas are most relevant today. First,
he did not see universals as separate from particulars the way Plato did; to Aristotle, they formed part of the
Plato: Dogness; Lassie, Fido, Benji
Aristotle: Lassie dogness; Fido dogness; Benji dogness
This suggested a certain kind of representation, and the only way to know a universal would be via active
processing, because they were not directly available to the senses. He didn't suggest how this might be done, but
then again we still have not figured it out!
Aristotle's principal contribution was his doctrine of association, which held that mental life could be explained in
terms of two basic components: ideas (the elements) and associations (the links) between them. He posited three
laws of association:
1. contiguity: same time or space
2. similarity: alike conceptually
3. contrast: opposites
These bases for connection still appear in theories today (cf. the importance of contiguity in classical conditioning;
or Murdock's (1972) distinction between item and associative information in memory).
We have to jump through the dark ages all the way to the 1700's, when the British Associationists, led by Hobbes
and Locke, reformulated the concept(s) of association. Although others had promoted the study of mental life in the
interim, it was only then that things cognitive began to get moving again. This is interesting because so much of the
resulting British Empiricism is just a complete rediscovery of Aristotle's ideas. Through the late 1700's and early 1800's, all sciences blossomed, and in the early 1800's research on the physics of
sensory systems became particularly focal. A history of psychology could detail the development of psychophysics
under such giants as Helmholtz, Fechner, Weber, and others. These men were building a way to study the
unobservable world of the mind, a necessary precursor to the emergence of cognitive psychology.
Psychophysics: the systematic study of the relation between the physical characteristics of stimuli and the
sensations that they produce.
Franciscus Cornelius Donders (1868) was a Dutch physiologist who was the first to find a way to measure "thinking
time". He measured the time between a stimulus and different types of responses. The reaction or response time was
used to infer the duration of mental processing. For example, subjects were asked to press a key as quickly as they
could when they felt a touch on either foot (simple reaction time), or to press the key when they felt a touch only on
their right foot (choice reaction time). Donders compared simple reaction time with choice reaction time and the
difference was a measure of the decision time for the choice. We will see that measuring response time is still an
important tool in the study of cognitive psychology.
Wundt (1879) usually receives credit for the creation of psychology, but it was all around scientists of the day.
Interestingly, Wundt was initially anti cognition. He divided the subject matter of psychology into the simple
psychical processes (reflex, sensation), which could be studied scientifically, and higher psychical processes "about
which nothing can be discovered in such experiments."
Yet Wundt, and later his students, led by Edward Titchener in the US, emphasized a method of study of psychical
processes that was, in its way, quite cognitive. He wanted to study conscious mental processes, and since only the
individual could know his or her own thoughts, the best way to study them was to look in on them. This technique
came to be called introspection, and people were trained to make reports on their own minds. This way of doing
research came to be called Structuralism, because the goal was to find the structural elements of mind. But it soon
had apparent disadvantages, like you couldn't study unconscious events, and there is no way to avoid the "filter" of
an individual's experience, beliefs, biases, etc. Introspection was subjective, not objective, and we demand that
science be objective.
Wundt's impact was great enough that little empirical work on "higher" cognition was carried out in the 1800's with
the notable exception of Ebbinghaus's (1885) treatise called Memory, which really revolutionized experimental
psychology. Ebbinghaus estimated the forgetting curve, and developed an ingenious procedure to measure memory,
the method of savings. Ebbinghaus memorized lists of nonsense syllables and tested his own memory for these items
at different intervals. Even though he might not be able to recall any of the nonsense syllables after a time, he found
that it took less time (fewer trials) to relearn a previous list than it took when he originally learned the list. The
difference or advantage between relearning and original learning is the savings, and provides a sensitive measure of
memory. We'll talk about his work more during the course.
In the late 19th century, Functionalism was very cognitive, as set out by William James at Harvard. The goal here,
influenced by Ebbinghaus's success in experimenting on memory, was to develop experiments to test theoretical
ideas about how cognition worked. James also made the distinction between primary and secondary memory - we
know this today as the difference between short-term and long-term memory.
And then came the major movement of Psychology's history. In the 20th century, Behaviourism was incredibly anti
cognitive, going so far as to suggest there was nothing at all in the black box of mind (cf. behaviourism under
Watson, and radical behaviourism under Skinner). Although many good empirical methods were developed in the
first half of the century, the climate was totally wrong for a science of thought. Only the Gestaltists in Germany
fought this tide, but their impact at the time was minimal, and largely constrained to the study of sensation and
perception. (In contrast to Structuralism, Gestaltists believed that sensations and perceptions could not be reduced to
their basic elements. Rather, they believed that the "whole" was greater than the "sum of its parts" and had to be
studied as such.) How did cognitive psychology finally emerge from the shadow of behaviourism?
Emergence of Cognitive Psychology It is hard to imagine today believing that thought did not exist, and that all psychology could be explained in terms
of stimulus response links as the behaviorists did. Still, their view had the advantage of dealing entirely in
observables, and it was well defined. It was also a simple theory, worth a try to see how far they could run with it.
Yet they ignored so much of what went on before and during their reign. For instance, in his classic book Principles
of Psychology, William James (1890) discussed in quite modern terms attention, memory, imagery, and reasoning.
Kohler (1913) carried out studies of problem solving by apes on the Isle of Tenerife by observing how apes would
use sticks and boxes to obtain bananas that were hung out of their reach. (As an historical aside, it has been
suggested that during the course of his research, Kohler also served as a spy for his native Germany during World
War I.) It was Kohler who introduced the concept of insight. Sir Frederick Bartlett developed a very cognitive view
of memory in Remembering (1932), where he described memory as a reconstructive process. His book was ignored
at the height of behaviourism. Similarly, the importance of Duncker's (1945) work on thinking (in particular
functional fixedness - the limitation in our ability to see a novel use for an item that already has a purpose, which
interferes with creativity) was not fully appreciated until the fall of behaviourism.
What finally brought about the change? It was a confluence of factors in the mid 1950's and on. First and most
obvious, Behaviourism was failing. Some began to realize that mediation (i.e., internal thought) was a necessary
construct, totally at odds with behaviourism. Piaget's (1954) developmental studies were crucial here. Second,
information theory and communication theory grew up in the late 40's (cf. Shannon & Weaver, 1948), leading to a
new perspective on information and how it is processed and how it could be measured. Third, modern linguistic
theory developed new ways of looking at language and in particular grammatical structure (cf. Chomsky, 1957),
often at odds with behaviourism. [Indeed, Chomsky wrote a famous treatise that savaged Skinner's huge work on
language called "Verbal Behavior," and is often considered to be the death knell for behaviourism.] Fourth,
computers became quite available and with them a new perspective on processing and storage of information. All of
these led to a new desire to understand the more complex mental processes, and the birth of contemporary cognitive
The critical work of the period includes several papers and books that we will be dealing with in weeks to come. But
two rather general books on cognition stand out-Miller, Galanter, and Pribram's (1960) Plans and the Structure of
Behavior and, the first totally cognitive book, Neisser's (1967) Cognitive Psychology. By the mid 1960's, cognitive
psychology was clearly establishing itself as the dominant in the field of psychology, important enough, as Reisberg
notes, to be described as the cognitive revolution.
Lecture 2: Sensory Memory
Associated with each of our sensory systems is a memory that briefly holds the incoming information. Neisser
(1967) named the sensory memory associated with the visual system iconic memory ("icon" or visual image), and
the sensory memory associated with the auditory system echoic memory (the root word being "echo"). Today we are
going to focus on iconic memory, in part, because more research has been done on this sensory memory system.
As an historical footnote, iconic memory was first documented by Segner (1740) a Swedish scientist. He attached a
glowing coal to the freely spinning wheel of a cart and gradually increased the rate of revolution until people
reported seeing a continuous circle. After calculating the time needed for a single revolution at this speed, he
determined that the duration of iconic memory must be 100 msec. We will see that this turned out to be a fairly
Let us begin our examination of iconic memory with how we process a visual scene. We do not take in all of the
information available to us at once. Rather, we create an interpretation of the visual scene through a series of
fixations. Each fixation lasts approximately 200 milliseconds (msec) or 1/5 of a second. The movements of our eyes
from one fixation to another are called saccades. Saccadic (voluntary) movements take about 50 - 100 msec. Thus,
we have 3-4 fixations per second. The lecture slides show a typical scene, and a series of fixations numbered in
order from 1 to 30. Note how the fixations center on the information rich portions of the scene as they trace out the
parts of the picture. Part of the information in the picture is taken in during each fixation and these different parts are
assembled to form an interpretation of the entire picture. During saccades, little or no information is processed, as
the scene is a blur during these rapid movements. (You can determine for yourself that little information is taken in
during these movements. Try to see your own eyes move in a mirror. You can't!) Baxt (1895), who had an interest in the process of reading, wanted to know how much information we can glean
from a single glance, or fixation. To answer this question he asked subjects to read a set of random letters. The
letters were covered with a solid wheel that had a segment cut out of it. When Baxt spun the wheel, the letters could
be briefly seen through the empty segment. Baxt found that subjects could report 4-5 letters, on average. This limit
became known as the perceptual span, or span of apprehension.
For many years, the perceptual span was interpreted as the limit on stimuli available for further processing. Then, in
1960, George Sperling did his PhD thesis on the subject and changed our ideas. In his first experiment, he redid
what Baxt had done. Sperling, however, had equipment (a tachistoscope) that allowed him to very precisely display
visual images. When Sperling displayed a random set of letters for exactly 50 msec (less than the duration of one
fixation), he observed, as Baxt had, that subjects could correctly report about 4 or 5 items. Was this the limit on the
amount perceived in a single fixation?
Subjects taking part in Sperling's experiment noted two things:
1. They claimed that they had actually seen the whole array, but that they "forgot" it while reporting. Was this a
perceptual problem or a memory problem?
2. They claimed that the array seemed to fade before their mind's eye, but that it was definitely available to examine
even after the display went off the screen. Were they right, and if so what did this mean?
Sperling proposed a model of visual processing that included a visual sensory store that briefly held information.The
information in this store is analyzed ("read") by a pattern recognition process that identifies (or "names") the stimuli,
and the analyzed information is then held in immediate or short-term memory. Sperling set out to answer the
question of what limits whole report to 4-5 items. Is the pattern recognition process too slow? Or, is it due to
limitations of short-term memory?
Sperling presented the letter display to subjects for 500 msec instead of 50 msec. If the pattern recognition process is
slow, then giving this process more time should increase the number of letters subjects can read. If it is a limitation
of short-term memory, then the extra time will have no effect. Sperling found that even when presenting the display
for 500 msec subjects still could only report 4-5 letters. He concluded that this limit was not because the pattern
recognition process was slow. Rather, the limitation had to be due to short-term memory.
To overcome the limitation of short-term memory, Sperling came up with the partial report procedure. For some
displays, he would ask subjects to report the whole display as had been done before (the whole report procedure).
However, on some other displays, he would signal subjects to report only one of the rows in the display. For
instance, he signaled to recall the top row with a high tone, the middle row with a medium tone, or the bottom row
with a low tone. Subjects would not know until the instant the display went off the screen whether to report all or
part, and if it was part, they would not know which part until the partial report cue was presented.
In this task, the letter array was on the screen for 50 msec, and the tone occurred just after the array disappeared.
Sperling could estimate how many letters were available or held in iconic memory by multiplying the average
number correct on the partial report trials by the number of rows. (This estimate is based on the logic that if subjects
could correctly recall, say 3 of 4, letters from one of three rows cued at random, then subjects must have 3 of 4
letters available from every row. So, 3 letters recalled times 3 rows would equal 9 letters available in iconic
Sperling found that the estimate of the number of letters available in iconic memory was greater than 4-5, the limit
of whole report. So, iconic memory can hold more information than can be held in immediate memory. Therefore,
perceptual span or the "span of apprehension" is really a limitation of memory and not of perception.
Sperling went on to answer three questions about the characteristics of iconic memory. We will consider each of
these questions in turn. 1. What information is represented in iconic memory?
To answer this question, Sperling varied the type of cue that signaled which part of the stimulus display to report in
the partial report procedure. Remember that Sperling first used an auditory tone to cue subjects as to which row to
report. The fact that performance (the estimate of the number of letters available) was better in the partial report
procedure than in the whole report procedure means that the partial report cue was effective. So we already know
that spatial information, or location, is preserved in iconic memory because subjects could report by row.
Sperling then presented letters of different sizes and cued subjects to report only the large, or only the small letters.
This was also an effective partial report cue, so information about size is also preserved in iconic memory. Similarly,
Sperling presented letters in different colours and cued subjects to report only the red or only the green letters.
Again, this was an effective partial report cue, so colour is also represented in iconic memory.
Finally, Sperling presented letters and digits, and cued subjects to only report letters or only report digits. This cue
was not an effective partial report cue. Why wasn't it? Note that all of the effective partial report cues are based on
physical properties (location, size, colour). The difference between letters and digits is not a physical difference, it is
a semantic difference. In other words, to know whether a symbol is a letter or a digit, one must interpret (or "read")
the symbol. So, because letters versus digits was not an effective partial report cue, Sperling knew that the
information held in iconic memory is precategorical. That is, the information has not yet been processed for
meaning. Thus, iconic memory preserves only the physical features of the stimuli.
2. What is the duration of iconic memory?
How long are the physical features of the stimuli held in iconic memory? To answer this question, Sperling varied
the delay between the termination of the letter display and the partial report cue. He predicted that at very long
delays after the array, the tone signal should not help because the iconic image will have decayed away. Because
subjects will not know which row to report until it is too late, they should get about 4 to 5 right, the perceptual span
(or whole report limit). With no delay, subjects should do very well. The key question is what would happen with
Two outcomes seemed plausible. If information is only available while the display is actually visible-if there is no
iconic image-then any delay should cause performance to drop to the span value. If there is an icon, then fading
should result in a gradual decline in performance with time as the image fades. As you may have guessed, the fading
view was supported, and Sperling showed that the second introspection of subjects was correct, too. This image
lasted somewhere in the vicinity of 250 msec to a second, depending on perceptual features such as brightness, etc.
Erikson and Collins (1967) estimated the duration of iconic memory using a different type of procedure. They
constructed pairs of dot patterns such that each one of the pairs appeared as a random set of dots. But when the pair
of dot patterns were presented together, letters could be seen. Erikson and Collins presented the two patterns either
simultaneously, or with a variable delay between the two. When the patterns were presented together, or when the
delay between the presentation of the first and second patterns was less than a second, subjects could report the
letters. This shows that the first dot pattern was still in iconic memory when the second pattern was presented, as
subjects could see the letters. But when the delay between the first and second patterns was too long (greater than a
second), the first dot pattern had faded by the time the second pattern was presented, and subjects could not see the
Erikson and Collins' experiment provides converging evidence that supports Sperling's estimate of the duration of
iconic memory being less than a second. (In cognitive psychology, as in other sciences, the more different ways we
can demonstrate a finding, the more confidence we can have in that finding. This is the logic of converging
operations.) Note as well that Erikson and Collins' experiment also demonstrates that the capacity of iconic memory
must be quite large, as iconic memory must be able to hold all of the dots from both of the patterns that they showed
3. How is information lost from iconic memory? We have already seen that information is lost from iconic memory through a decay process that occurs over time.
Later, researchers demonstrated one other way that information could be lost from iconic memory - pattern masking.
When researchers presented a set of letters for subjects to report, and then a pattern mask (visual noise or random
patterns), the pattern mask interfered with subjects' ability to report the letters. Just as the dot patterns used by
Erikson and Collins merged together in iconic memory when the delay between the two patterns was not too long,
the letters and the pattern mask also merge together at short delays. Because the pattern mask is noise, it makes the
letters difficult or impossible to read when they are combined. When the pattern mask is presented just before the
letters, it is called forward masking (because it interferes with a display that is forward in time), and when the
pattern mask is presented just after the letters, it is called backward masking (because it interferes with a display that
is backward in time). The greater the delay between the pattern mask and the letters, the less effect the mask will
So, information in iconic memory is lost due to a rapid decay process. And information in iconic memory can be lost
due to the interference of other stimuli (masking).
Sperling's research has shown (1) that a lot is perceived in a single fixation, (2) that an iconic image persists after the
display disappears, and (3) that this image decays and is lost very rapidly. Of course, he also showed that whole
report or perceptual span is not a good measure of what is perceived.
Further research has shown that there are sensory memories associated with our other sensory memories as well.
QUESTION: How might you do an auditory experiment analogous to Sperling's experiments on iconic memory?
ANSWER: Use Stereo, with two to four apparent directional sources and cue them.
The major difference between the sensory stores for visual and auditory information is that echoic memory lasts
longer-on the order of 1 to 4 seconds. This has survival value-vision is simultaneous, with a 250 msec "window".
Iconic memory helps us maintain a stable interpretation of the visual world (remember that we are essentially blind
during the saccadic movements between fixations that last about 50-100 msec). Auditory information such as
speech, on the other hand, is sequential and spread out over time. Thus a longer-lasting echoic memory enables us to
process signals such as speech over time.
So, we know that there are sensory stores, that their capacity is quite large, and that forgetting occurs rapidly via
decay and due to masking. We also know that the information held in sensory memories is precategorical; that it
represents the physical features of the stimulus and has not yet been analyzed for meaning. In the next two lectures
we will examine how this information is processed for meaning. First we will consider pattern recognition in vision,
and then we will examine speech recognition in audition.
Lecture 3 – Pattern Recognition
To begin, we can define pattern recognition as how people identify objects in their environment. It is a general set of
processes whereby the continuous stream of stimulation around you is segmented into discrete, labelable units based
on experience. This occurs at many different levels, from recognizing a light is on to being able to play successful
chess and recognizing vast numbers of moves. Although the process is very general and applies all over the
information processing system, we will focus on it where it first begins to matter-in transferring information from
the sensory store to short term memory.
Essentially, pattern recognition takes perceptual elements and transforms them into symbols and then into concepts
that have meaning. This is done by relating the information in sensory store to information already known, and stored in long term memory. As is so often the case in studying cognition, pattern recognition is a very fluent and
seemingly effortless skill, which makes it hard to study. Yet there are ways, and we will consider some of these as
we try to understand how people recognize patterns.
Template theories maintain that patterns are treated as unanalyzed wholes, and that comparison of what is perceived
to what is already known is accomplished by some kind of measure of overlap or similarity. That is how the number
system at the bottom of bank cheques works-each of the digits always looks exactly the same and there are only 10
of them, so computer matching is a very straightforward process. The striped universal codes on products in stores
that are scanned work the same way. But what about more complex situations where pattern recognition is required?
Imagine how hard it is to recognize handwriting using such a system. Everyone writes differently, using different
angles, different sizes, and even different shapes. Thus, we would need almost a infinite number of templates to be
able to recognize all the variations of a letter. Or, we would need a way to adjust the pattern to-be-recognized (in
terms of orientation, size, etc.) before comparing the new pattern to our templates in memory. But how would the
pattern recognition system know how to adjust the new pattern before it has been recognized? And what is the
template for a potato, which we can easily recognize, but each potato is not identical? Also, a template model would
not describe how two patterns differ, only that they do differ. Nor would it explain how the same pattern can have
two different interpretations, as the symbol "O" (a zero and a letter) does. Thus, template theories seem unlikely to
handle the complex problem of human pattern recognition at a general level, although they may actually be used in
some circumstances. Such an instance might be in the early stages of learning a new alphabet or set of symbols.
Another way to explain pattern recognition is via a feature theory. A feature theory can be defined as a system that
allows us to describe a pattern by listing the elements of that pattern. Patterns consist of elementary attributes which,
when put together and interpreted, can be seen as a meaningful concept. One of the advantages of a feature theory is
that it ties in well with what we know about how people identify concepts (such as "furniture"), a problem we will
be considering later in the course.
There are three lines of evidence that provide compelling support for the view that pattern recognition is based on
the analysis of the component features. Let's consider each type of evidence in turn.
1. Visual Confusions
The more similar two items are in terms of their features, the greater the chance they will be confused. Let's consider
the letters of the English language. The different features of each letter can be distinguished, as in the table in the
lectures. When these letters are presented rapidly (i.e., under conditions that make it difficult to always correctly
identify the letters), confusions among similar letters increases. The more features two letters have in common, the
more likely they will be confused. Visual confusions based on the number of shared components between items is
one line of evidence indicating the importance of feature analysis in pattern recognition.
2. Visual Search Studies
In visual search studies, subjects are asked to search a display for a particular target. Neisser (1967) used this task to
demonstrate the importance of features. He found that subjects are faster to find a letter like "Z" when it was in a list
of letters with rounded features (e.g., O, R, B, etc.) than when in it was in a list of list of letters composed of mostly
of horizontal and vertical lines (e.g., T, M, V, X). In other words, searching for a target is easier when the non-target
letters have dissimilar features than when they share many of the same features as the target.
Neisser also asked subjects to find target words in lists of different words (e.g, is the word "sand" in the list?), or to
search for words with a particular meaning (e.g., is there an animal in the list?). He found that subjects were much faster in finding specific words (like "sand") than searching for words based on meaning. This is because it is easier
to base one's search on physical features rather than having to read each word, interpret its meaning, and base the
search decision on the meaning of the word.
Anne Treisman carried out a series of visual search studies that also demonstrated the important role of features in
pattern recognition. In one study, she found that subjects were faster to search for a target that was an incomplete
circle amidst non-targets that were complete circles, compared to searching for a complete circle among incomplete
circles. This result indicates that the visual system treats "gap" as a feature, but does not treat "no gap" as a feature.
Treisman also showed that subjects were very fast at search for target letters such as a "green T" when none of the
non-target letters were green. In contrast, when subjects looked for a "white T" amidst white and black letters, they
were much slower to find the target. Searches that can be based on a single feature (such as "green") are much faster
than searches based on a combination or conjunction of features (such as "white" and "T"). Indeed, in a single
feature search the target seems to "pop out" at you, but there is no pop out for conjunction feature searches.
3. Physiological Evidence of Feature Detectors
Visual confusions and visual search experiments provide behavioural evidence for feature detection in pattern
recognition. There is also physiological evidence for feature detectors. Information collected from the
photoreceptors in the retina is sent via the optic nerve to the lateral geniculate nucleus or LGN. (It is interesting to
note that information from all of our senses except one go to the LGN. The exception is our sense of smell, which is
one of our most primitive senses. Information about odour goes from the receptors in the nasal cavity directly to the
part of the brain called the olfactory bulb.) From the LGN, the visual signal is projected to the visual cortex. There
are different types of cells, arranged in columns, in the visual cortex. Studies that have measured the responses of
these cells to different types of visual stimuli have shown that the different types of cells are tuned to different types
of features. For example, simple cortical cells are most sensitive to lines of a particular orientation, whereas
hypercomplex cells are most responsive to specific combinations of features such as corners or angles. You needn't
worry about what cells do what. The important point to note is that different cells in the visual cortex are specialized
to different visual features. These cells are the feature detectors that provide the basis for pattern recognition.
Structural Theories of Pattern Recognition
Structural theories take feature theories as a starting point, then try to define the relations among features once the
set of features is specified. Thus, structural theories extend theories of feature analysis, and emphasize the relations
among features, or how features are fit together in pattern recognition. This additional complication is necessary to
have a successful pattern recognition theory because of the precision it provides. Unfortunately, these theories have
not been taken very far as yet.
For our present purposes, I just want to provide one experimental example of the importance of structural theories of
pattern recognition. Biederman (1985) took line drawings of common objects and removed 65% of the lines. He
then asked subjects to try to identify the objects. The slides in the lectures provide examples of these incomplete
objects - is it easier to identify the pictures on the left or the right? Subjects were far better at identifying the objects
when the missing lines where at midsegments rather than at vertices. In other words, when corners or angles were
preserved it was easier to identify the pictures, because more information about how the features fit together was
still available. So, both features (feature theories) and how features go together (structural theories) are important in
BOTTOM UP POCESSING
So far we have been talking about a model of pattern recognition that begins with the sensory input and ends with an
abstract, meaningful interpretation of the input. This is a "bottom-up" view of the pattern recognition process. In
information processing terms, the sensory information is the bottom and the representation is the top. Consider that
part of the system that is trying to recognize the letters in the word "birthday". A bottom up process to accomplish
this would go through a series of successive steps, with the output of each step serving as the input to the next step. Gradually, we would move from light vs. dark splotches on a page to the separation of figure and ground, to the
identification of the features, identification of the letters, and finally to the recognition of the word.
A bottom up process begins with the sensory input and ends with its representation, with a series of orderly steps
from bottom to top in between. The defining property of a strictly bottom up process is that the outcome of a lower
step is never affected by a higher step in the process.
Read the following sentence as quickly as you can outloud:
Paris in the
Did you notice there are two "the"s in the sentence? If reading (and all pattern recognition, for that matter) were
strictly a bottom-up process, you would have seen the repetition. You would also be reading much more slowly than
you do. Bottom-up processes are complemented by our knowledge and past experience. You read the above
sentence very quickly, in part, because it is familiar to you, and you can anticipate the end of the sentence from the
beginning. This is the contribution of top-down processes to pattern recognition.
We use knowledge based on past experience, and the context, to guide pattern recognition. This is top-down
processing. Both bottom-up and top-down processing operate together in pattern recognition. This is called the
interactive model of processing.
The lecture slides provide some examples of how context influences pattern recognition. For example in the two
handwritten sentences, the identical pattern is recognized as "went" in one sentence and "event" in the other
sentence. The use of context to make the most appropriate interpretation of an ambiguous pattern demonstrates the
role of top-down processing.
Look at the slide in the lectures that shows black shapes on a white background. If you have never seen this picture
before, you might have trouble seeing what it is. It is, initially, difficult to discriminate the figure from the
background, and identify the relevant features of the figure. If you have never seen this picture before, you must rely
only on bottom-up processes to identify the pattern. (The picture is a Dalmatian dog sniffing leaves on the ground.)
But the next time you see this picture, you will have no trouble seeing the dog. You can use past experience, or top-
down processing, to help interpret a familiar picture. Similarly, you may initially have trouble seeing the deer's mate
(outlined in the trees) or seeing the hidden tiger the first time you look at these pictures. But, the next time you see
these pictures the hidden mate and the hidden tiger will jump out at you, because of top-down processing based on
prior knowledge. (The hidden tiger is in the letters defined by the tiger's stripes.)
Biederman, Glass and Stacy (1973) illustrated the contribution of top-down processing in their visual search study.
They gave subjects pictures and asked them to search for a particular target (like a bicycle, or a hydrant). Some of
the pictures were normal, and others were normal pictures that were "jumbled up". Note that both types of pictures
contain exactly the same visual information. Not surprisingly, the subjects were much faster in finding the target in
the normal picture than in the jumbled picture. This is because subjects can use context and prior knowledge
(hydrants are on the ground), or top-down processing, to help guide their visual search for normal pictures. When
the picture is jumbled, the context is lost and top-down processing cannot play much of a role.
Interactive Model of Reading
In a study involving letter and word identification, Riecher (1969) presented subjects with a briefly presented
stimulus display that was either a four letter word, a single letter, or a nonword (a four letter word where the letters
were scrambled). The study display was followed by a test display that contained a pattern mask (##'s) and two
letters. (The pattern mask was to eliminate the iconic image of the stimuli to make reading the stimuli more difficult.) The subjects were asked to indicate which of the two letters in the test display had been shown in the
study display. In which of the three conditions (word, single letter, nonsense word) do you think it would be easiest
to identify the letters? It seems obvious that the nonsense word would be the most difficult condition because it
involves an unfamiliar combination of letters. And that would be right. But what is not so obvious is that subjects
were more accurate in identifying the letter at test when it was presented in the context of a word than when the
letter was presented by itself. This finding has been called the:
Word Superiority Effect: A letter is identified more accurately in the context of a word than when it is presented by
When a random letter is presented by itself, without any context, we can only use bottom-up processes to identify
the letter. When a letter is presented in the context of a word, we can supplement bottom-up processes with top-
down processes (information about what letters go together in words), which helps to speed-up the letter
identification process. (When a letter is presented in the context of a nonword, the top-down processes can hinder
letter identification because a nonword involves irregular or unfamiliar letter groupings.)
The word superiority effect provides us with a basis for the McClelland and Rumelhart's (1981) interactive model of
reading. (The diagram of this model is a simplified version of the actual model, but it serves our purpose here.) In
the model, bottom-up processes operate on the visual input to identify first features, then letters, and finally words.
Note that top-down processes at the word level can provide information that can help speed-up identification at the
letter level. It is this interaction between top-down and bottom-up processes that makes pattern recognition such a
fluent and efficient process.
Lecture 4 – Speech Perception
Many Intro Cognitive texts discuss speech perception as a precursor to language. I, however, like to consider speech
perception as part of our discussion of pattern recognition because speech perception is, after all, just another form
of analyzing and identifying patterns. In speech the patterns to be recognized are auditory signals spread out over
The smallest element of speech is the phonetic segment or phone. The phonetic alphabet is a culture free system of
describing all sounds used in any language. Any given language only uses a subset of all the possible phones. The
phoneme is the smallest element of speech that makes a meaningful or semantic difference in a specific language.
For example, consider the two phones /k/ and /q/ (as in the words keep and cool). (To convince yourself that these
two "ku" sounds are different, say "keep" and "cool" and notice that your mouth makes different initial movements
in saying these words.) In Arabic these two phones are also phonemes, as they distinguish between words. But in
English, we treat these two phones as equivalent and they do not make a meaningful difference in distinguishing
There are two ways to describe the characteristics of speech sounds. First, speech sounds can be thought of in terms
of how they are produced, or articulated. As an example, say the phonemes /b/, /d/, and /g/ out loud (pronounce
these as vowels, not letters of the alphabet - the //'s indicate that it is a phoneme rather than a letter of the alphabet).
Notice how the position of both your lips and tongue are different when you say these three phonemes. Place of
articulation is one type of articulatory feature of speech sounds. For /b/, the force of the sound is at the front of the
mouth with the lips initially closed. This place of articulation is called bilabial. In saying /d/, the force of the sound
is from the middle of the mouth; your tongue touches the roof of your mouth. This articulatory feature is called
apical. Finally, for /g/ the force of the sound is further back still, and this feature is called velar. A second type of
articulatory feature is voicing. Some phonemes, like /b/ are more forcefully expressed (voiced), whereas other
phonemes such as /p/ are less strongly emitted (unvoiced). There are several other types of articulatory features that
we need not worry about. The main point here is that speech sounds can be described in terms of how our speech
apparatus (the lips, tongue, and larynx) actually produce or articulate phonemes.
The second way to distinguish between phonemes is in terms of their physical or acoustic features. A speech
spectograph shows what sound frequencies are present and how the frequency varies over time for speech.
Spectographs for phonemes have three dominant frequency bands that rise or fall to a steady-state over time. These are called formants (numbered one to three from low to high frequencies). The formants (especially the first and
second) provide the necessary auditory information to distinguish one phoneme from another.
Just like we saw for visual pattern recognition, speech perception involves the operations of bottom-up and top-
down processes to identify features, phonemes, and words. I am going to discuss three phenomena of speech
perception - phoneme restoration, segmentation, and co-articulation (or parallel transmission) - that show the
importance of top-down processes in speech perception. We will end with the phenomenon of categorical perception
that provides evidence for feature detection. The analysis of features provides the basis for bottom-up, or data-
driven, processes in speech perception.
1. Phoneme Restoration
Warren (1970) asked subjects to listen to tape-recorded sentences. Warren physically removed a phoneme from a
word in a sentence and replaced the missing phoneme with the sound of a cough. After listening to each sentence, he
asked his subjects if they heard anything unusual and whether anything was missing. Subjects heard the cough, but
most subjects did not notice that a phoneme was missing. Warren called this phenomena "phoneme restoration".
Based on the context of the other words in the sentence, top-down processes filled in or restored the missing
information. This is done in such a fluent manner that we usually do not even perceive that something was missing.
In a second, and more dramatic study, Warren and Warren (1970) also omitted a phoneme from a word in a
sentence. An example of one of their sentences is:
It was found that the *eel was on the orange.
Note that the word with the missing phoneme could be one of several different words (meal, wheel, peel, steal, deal,
heel, etc., that all sound alike except for the first phoneme). Also note that there is only a single word in the sentence
that provides the context to interpret or understand the incomplete word (orange), and this word occurs at the end of
the sentence. Even so, Warren and Warren still found that subjects rarely noticed the missing phoneme. Thus, top-
down processes do not need very much context to operate, and top-down processes can influence the processing of
preceding as well as subsequent information in a sentence.
Think back to when you heard a foreign language that you do not speak or comprehend. What did you hear? A
language we do not understand sounds like a fairly continuous stream of sounds. This is, actually, how it should
sound. A spectograph of ongoing speech shows that there is a continuous stream of sounds.
Now, what do you hear when you listen to a language that you do understand? Our perception is that we hear a
series of distinct words with pauses between the words. These pauses, though, are an illusion created by our analysis
and comprehension of the words in the sentence. The gaps between words in spoken speech are not physically
present in the speech stream.
But if the speech stream is continuous, how does the pattern recognition system know how to partition the phonemes
into the right words? This is the segmentation problem. And the solution to this problem is the contribution of top-
down processes that use context to help guide the analysis of speech. Let me try to illustrate this idea with an
example. Suppose you simply said to a friend out of the blue (i.e., with no context) "more on". Would your friend
hear "more on" or "moron"? How would they know whether you said one word or two? They wouldn't. They would
need context in order to interpret what you said (unless, of course, they jumped to a hasty conclusion!). One can
think of many such examples of expressions that are acoustically ambiguous in the absence of context. Here are just
a few: "intense" or "in tents", "real of" or "real love", and one of my favourites, "fun guy" or "fungi". I hope the point here is clear. Top-down processing based on context plays an important role in solving the segmentation
problem in continuous speech. Without context, it can be difficult to know how to interpret speech.