PERCEIVING AND RECOGNIZING OBJECTS
WHAT AND WHERE PATHWAYS
Extrastriate cortex: the region of cortex bordering the primary visual cortex and containing
multiple areas in visual processing
These areas are named V2, V3, and so on
In the extrastriate regions just beyond V1 (like V2), receptive fields begin to show an interest in
properties that will be important for object perception.
From the extrastriate regions of the occipital lobe of the brain, visual information moves out
along two main pathways.
o One pathway heads up into the parietal lobe. Visual areas in this pathway are important
for processing information relating to location of objects in space and the actions
required to interact with them (moving hands, eyes). This pathway is sometimes called
the where pathway.
o The other pathway heads down into the temporal lobe (what pathway). This pathway is
the locus for explicit acts of object recognition. As we move down into temporal lobe,
receptive fields get much bigger.
o But, some basic object information is represented in both pathways and some where
information is encoded in the temporal lobe what pathway.
Early evidence between temporal lobe and object recognition studying monkeys with
damaged (lesioned) temporal lobes.
The monkeys could see objects but didn’t know what they were seeing, this is seen in stroke
victims and is called agnosia.
The part of temporal lobe responsible is inferotemporal (IT) cortex
Neurons in triate cortex are activated by simple stimuli and respond only if their preferred
stimuli are presented in restricted portions of the visual field. In contrast, cells in the IT cortex
have receptive fields that can spread over half or more of the monkey’s field of view.
Types of stimuli that activated IT cells well were not the usual spots and lines, but a monkey’s
hand for some cells, monkey faces for other cells.
This led Barlow to suggest a hierarchical model of visual perception in which small receptive
fields and simple features of visual cortex combine with greater complexity as you move from
striate cortex to IT cortex.
Grandmother cell is a term that refers to any cell that seems to be selectively responsible to one
IT cortex maintains close connections with parts of the brain involved in memory formation
(hippocampus) important because IT cells need to learn their receptive field properties.
Logothetis demonstrated that cells in IT cortex have plasticity. After training monkeys to
recognize novel objects, they found IT neurons that responded with high firing rates to those objects, but only when the objects were seen from viewpoints similar to those form which
they’d been learned.
Homologous regions: Brain regions that appear to have the same function in different species.
Like monkeys, humans with lesions in temporal lobe show symptoms of agnosia.
Prosopagnosia is an inability to recognize faces.
Other agnosia type includes the ability to recognize animate objects (animals) but not inanimate
If we flash a picture of an animal to an observer, the observer differentiates animal from
nonanimal scenes within 150 ms, meaning there can’t be a lot of feedback from higher visual or
This suggests it must be possible to do rough object recognition on the first wave of activity as it
moves from retina to striate cortex to extrastriate cortex and beyond
feed forward process: process that carries out a computation (e.g. object recognition) one
neural step after another, without need for feedback from a later stage to an earlier stage.
THE PROBLEMS OF PERCEIVING AND RECOGNIZING OBJECTS
middle (midlevel) vision: a loosely refined stage of visual processing that comes after basic
features have been extracted from the image (low level, or early vision) and before object
recognition and scene understanding (high level vision)
mid level vision is a process that combines features into objects.
The following part takes feature combinations given to us by middle level vision and asks how
we come to know what the object is (recognizing it by matching what we perceive now to a
memory of something perceived in the past).
Goal of middle vision is to organize the elements of a visual scene into groups that we can
recognize as objects.
The visual system knows that gaps are accidents of the lighting and fills in the contour
In the early stages of processing, the human visual system assembles evidence for the presence
of significant edges from cells with receptive fields at different scales.
o All these different bits of information are then combined to make the system’s best
guess about the presence of a contour.
Illusory contour: contour that’s perceived even though nothing changes from one side of it to
the other in an image.
RULES OF EVIDENCE
Structuralism: belief that complex objects or perceptions could be understood by analysis of the
components. Wilhelm Wundt and Edward Titchener argued that perceptions are the sum of atoms of
sensation (bits of color, orientation etc)
In a structulist view, perception is built up of local sensations the way a crystal is built up of an
array of atoms.
Gestalt: idea that perceptual whole could be greater than the sum of the parts.
Gestalt grouping rules: describe which elements in an image will appear to group together.
The “rule is that we tend to see similarly oriented linesas part of the same contour. Such lines
“support” each other in that 2 visible bits of an edge make it easier to perceive a third, collinear
segment that lies between them even if the middle segment is absent.
Good continuation: gestalt grouping rule stating that 2 elements will tend to group together if
they seem to lie on the same contour.
A host of rules, principles and good guesses contribute to our organized perception of the world.
These operate on a committee model.
Everyone gets together and tells how the stimulus ought to be understood. Eventually a
consensus view emerges and we settle on a single interpretation of the visual scene.
if an edge suddenly stops in an image, why does it stop? One reason is that something gets in
For the Kanizsa figure (4.11) example, the visual system thinks that there is another contour
occluding the vertical line, with the occluding edge oriented perpendicularly to the occluded
This combined with a guess that the notches in the circles represent contours that can be
extended leads to the inference of an illusory contour.
TEXTURE SEGMENTATION AND GROUPING
Texture segmentation: carving an image into regionsof common texture properties
The visual system can determine the average of the features in a region without knowing much
about the individual features.
Texture segmentation is closely related to the Gestalt grouping principles.
Two of the strongest principles are similarity and proximity.
Similarity: tendency of 2 features to group together will increase as similarity between them
Proximity: tendency of 2 features to group will increase as distance between them decreases.
Texture grouping can be based on similarity in a limited number of features:
o Color, size, orientation, and aspects of form.
Combinations (“conjunctions”) of features don’t work well thus, texture segmentation isn’t
clear. Two weaker grouping principles are: parallelism and symmetry
Parallelism: parallel contours are likely to belong to the same figure
Symmetry: symmetrical regions are more likely to be seen as figure.
The art of camouflage is the art of getting your features to group with the features of the
environment to persuade an observer that your features don’t form a perceptual group of their
PERCEPTUAL COMMITTEES REVISITED
Low level visual process deliver straightforward bits of information about line and color
Middle vision behaves like collection of specialists, each with a specific area of expertise and
individual “opinions” about what the input might mean. The goal is to have a single answer
emerge out of this diversity of opinions
COMMITTEE RULES: HONOR PHYSICS AND AVOID ACCIDENTS
Ambiguous figure: a visual stimulus that gives rise to 2 or more interpretations of its identity or
Necker cube: outline that is perceptually bi-stable. Unlike the situation with most stimuli, two
interpretations continually battle for perceptual dominance.
In theory, EVERY image is ambiguous but the perceptual committees almost always agree on a
Accidental viewpoint: viewing position that produces some regularity in the visual image that
isn’t present in the world (e.g. sides of 2 independent objects lining up perfectly).
Perceptual committees know about accidental viewpoints and don’t care to assume other
unlikely accounts of perceptual inputs.
A second set of assumptions made by the visual system involves an implicit understanding of
some aspects of the physics of the world.
o Example figure 4.11, we infer the arrow shaped object because of our implicit
understanding that solid objects block light.
An image is meaningless until middle and high-level processes dig into it. One committee uses
knowledge that opaque objects occlude other objects behind them to generate plausible
interpretations of image elements. Another committee considers all possibilities and devalues
any that involve accidental viewpoints, reducing what is initially an unsolvable problem to a
FIGURE AND GROUND
the ability to distinguish figures (objects in foreground) from ground (surfaces or objects lying
behind figures) is a critical step on the path from image to object recognition. Figure-ground assignment: process of determining that some regions of an image belong to a
foreground object (figure) and other regions are part of the background (ground)
What principles assign regions to figure or ground?
o Surroundedness: if one region is surrounded by another, it’s likely that the surrounded
region is the figure.
o Size: the smaller region is likely to be figure.
o Symmetry: symmetrical regions are more likely to be seen as figure
o Parallelism: regions with parallel contours are more likely to be seen as figure
o External edges: see figure 4.28
o Relative motion: how surface details move relative to an edge can determine which
portion of a display is the foreground figure and which is the background
DEALING WITH OCCLUSION
Relatability: degree to which 2 line segments appear to be part of the same contour.
In figure 4.30b, the visual system is unwilling to propose an elaborate relationship, so it
concludes that the lines aren’t related at all (not part of the same object).
Like the figure-ground rules, this heuristic (mental shortcut) isn’t infallible (because some
objects do have S shaped contours).
But the occlusion committee is willing to accept a few missed completions in order to reduce
the vast number of possible completions we would have to consider if we tried to connect every
pair of occluded edges.
Additional heuristics emerge when we move from 2 dimensions to 3.
T junctions occur when one surface occludes another; Y