Exam 4 Questions 04/03/2014
1) Describe Biederman's theory of object recognition by components. What
evidence does he provide to support this theory?
This structural description model makes use of properties that can distinguish most objects from one
another, yet remain relatively stable over changes in view.
Objects are defined as configurations of qualitatively distinct parts called geons.
Geons are defined by configurations of nonaccidental properties. (Nonaccidental properties are properties
of an image such as colinearity, termination or parallelism that seldom occur by accident within optical
The nonaccidental properties that distinguish geons are their number of straight and curved edges, which
edges are parallel to each other, the number of vertices of each type and the presence of symmetries.
For example, there are nonaccidental differences between a brick and cylinder. Bricks have parallel edges,
inner Y vertices and outer arrow vertices while cylinders have parallel edges, curved edges and tangent Y
However, this model is not “implementable” because it throws away a lot of information. Automated
computer algorithms fail in transforming complex representations into line drawings.
On average, observers require approximately three geons to reliably recognize an object
2) Describe the model of edge labeling proposed by Malik (1987). What are its
Malik proposed 4 main contours bounded on both ends by informative vertices. These edges and vertices
constrain how a figure can be interpreted as a 3D shape.
These contours are smooth occlusions, edge occlusions, convex edges and concave edges. The vertices
are L, curved L, 3tangent, arrow, Y and T. L vertices are the least informative and 3tangent vertices are
the most informative.
However, it is difficult to select a possible interpretation for each vertex so that every contour has a
consistent labeling from both of its vertices. Consistent interpretations are not always unique, and may
sometimes be impossible to achieve. Depending on the edges and vertices, the same image can be
interpreted in different ways (i.e. concave edge at bottom= attached to the ground, concave edge on side=
attached to wall, no concave edges= floating in air). Additionally, impossible figures do not allow consistent
interpretation of their edges and therefore cannot be accurately interpreted using only edge labeling.
3) What can people remember from briefly presented scenes, and what can they
not remember. Provide two examples. We can remember the gist of the scene but not the details or objects within it. Large scale classification
such as buildings, highways, etc.
We can identify scenes in about 125 ms!!
People can remember up to 2500 and even 10000 pictures at a rate of one image every 2 seconds.
We can very quickly understand scenes… SPEND ABOUT one second on each picture!!
which are old? which are new?
Pretty good recognition!! Highcapacity visual memory for scenes.
4) What are the Relational Violations proposed by Biederman et al. (1982) that can
slow down object or scene processing?
Five Relational Violations that can slow down object or scene processing according to Biederman et
Support: Object does not appear to be resting on a surface
Interposition: he background appears to pass through the object
Probability: The object is unlikely to appear in the scene.
Position: he object is likely to occur in that scene but is unlikely to be in that particular position.
Size: The object appears too large or too small relative to other objects in the scene.
5) What are structural and transformational invariants? Describe two experiments
that demonstrate the concept of transformational invariance in human perception.
Structural invariants – properties of an object that remain constant under different types of change
Transformational Invariants – Properties of a pattern of change that remain constant over different objects.
This is what allows us to identify events.
A Trajectory Based Analysis movement of umbrella, fixed angles and distances moving versus angles
change and movement happens
Todd, 1982 – Fixed Rotating Axis and Motion Rotating Axis
Trajectory Constraints for Rigid Motion about a Fixed Axis under Parallel Projection
All points move in an elliptical trajectory
All trajectories have the same eccentricity (i.e. the ratio of the major and minor axes) The minor axes of all trajectories must be collinear.
All points must traverse their trajectories at the same frequency.
Degrees of Oscillation
At low degrees of Oscillation (090`) the percent correct is low for all three conditions
Low Frame Rate, Rotation
High Frame Rate, Rotation
High Frame Rate, Precession
The curve’s slope (positive correlation) begins to lessen as the degrees of oscillation increase past 90 to
180 then 270.
Low Frame Rate, Rotation and High Frame Rate, Rotation are almost overlapping (same percent correct
relative to degrees of oscillation) however the High Frame Rate, Precession is considerably below the other
two in percent correct at all degrees.
6) Describe two models that have been proposed for discriminating rigid and
nonrigid motions. Describe an experiment that investigated the perception of
rigidity in human observers, and evaluate the results with respect to current
Ullman (1977) – Uses 2frame motion sequences. Cannot detect certain types of nonrigid motions.
It involves comparing corresponding points in two images of a motion sequence. If a motion sequence has
a rigid interpretation, then it is possible to create a set of parallel image trajectories by rotating one image
with respect to the other about the line of sight. However, while some nonrigid motions like variable
orientation are easily identified and detectable with 2 views, others, like variable xoffset, are not.
Todd (1982) – Uses image plane trajectories, and requires a large number of frames. Does not work for
moving axes of rotation, unless the effects of those motions are removed.
Hogervorst et al (1996) – Uses phase space trajectories, and requires a large number of frames. It also
requires that any rotations in the image plane be removed. The horizontal positions of two points with
parallel image trajectories are measured over a sequence of successive frames relative to a third point.
These are then plotted in phase space, in such a way that their relative positions are represented along
orthogonal axes. If the trajectory in this space is anything but an ellipse (or a line) centered on the origin,
then the object has no possible rigid interpretation.
Results suggest we detect rigidity in multiple ways Note that the variable orientation conditions are the only ones that can be identified as nonrigid by an
analysis of 2frame motion sequences, and these are also the easiest ones to identify as nonrigid by human
observers. Observers can distinguish the other types of nonrigid motion from rigid motions, but only when
presented with a large extent of rotation.
Variable Orientation is detectable from just 2 views
Participants were shown a variety of examples of rigid and nonrigid motion and asked to distinguish
between the two. The participants were fairly accurate (close to 100%) in all of the conditions (rigid, variable
orientation, variable frequency, variable xoffset) except for the variable eccentricity condition.
7) What does the HeiderSimmel film tell us about human event perception?
Describe an experiment that was designed to reveal some of the relevant
information in this film. What were its conclusions?
The HeiderSimmel film is an example of event recognition based on patterns of motion, and it supports
attribution theory. Attribution theory is the concept that people apply personal perspectives about motivation
to the interpretation of actions. The film shows both animate and inanimate motions, and it reveals the
intentions and emotions of the animate characters. Despite its complexity, the interpretations of this film are
remarkably consistent among different observers in different cultures, even for young children.
HeiderSimmel film: levels of perception, example of event recognition
Attribution theory: motivation/intent connected to actions, personal perspective
Predator/Prey Experiments: One target moves at different directions in relation to other targets i.e. 0
(directly at other targets) and at 30 degree increments. The detection accuracy decreases with an
increasing angle. Accuracy drops to about 50% at 120 degrees, so the area between 60 degrees and 120
degrees is the most useful for “hunting” because motion is not direct enough to be detected by prey, but not
random enough to be ineffective.
Gao, Newman, and Scholl, 2009: identify predator/prey, based on patterns of motion i.e. direct path, “heat
seeking” 0 degrees; more random, 30 degrees; difficult to detect, 90 degrees; occurs at chance at ~120
degrees ▯ predators should vary path to maximize stealth Sheep/Wolf Task (Scholl’s group): participant controls motion of sheep, computer control of wolf ▯ obvious,
dangerous, incompetent wolves
Varying shape/orientation of object affects detectability of motion patterns(Scholl’s group): oriented >
misoriented (45 degrees)> perpendicular
The Wolfpack Effect: random motion will appear intentional based on orientation of objects (arrows/circle)
8) What hypotheses have been proposed for how observers are able to factor out
the effects of their own eye movements in the perceptual analysis of optical flow.
Describe two experiments that have been designed to test these hypotheses.
Optic flow is the visual motion in the optic array caused by selfmotion. Factoring out the effects of
individual’s eye movement is referred to as decoupling and can be achieved by optical flow decomposition.
As you walk toward