Page 1 of7
Chapter 10: Depth Perception
• The visual system can make use of multiple visual cues regarding the depth of objects within view, and
nonvisual cues from the oculomotor muscles controlling eye position and accommodative state.
• Inverse Problem: We want to know the distal
stimulus, the 3D object in the real world;
however, all we have is the proximal stimulus, the
2D representation of the object on our retina.
• To accomplish this, we need top-down
processing in addition to bottom-up: using
knowledge, memories, and expectations to make
assumptions about our proximal stimulus.
• Metric Depth Cue: Quantitative depth cue, in which value varies continuously and proportionately with
• Ordinal Depth Cue: Cue values vary in discrete ordinal steps, such as “near” versus “farther”.
• Monocular Depth Cues: A depth cue available when only one eye is used, as well as with two eyes.
• Pictorial Depth Cue: Cue derives from the image itself – all are monocular; non-pictorial cues are
movement-related, oculomotor, and stereopsis (disparity).
• Size-Depth Ambiguity: Size of the image on the retina depends on visual angle, which depends on
the size of the object and the distance of the
• Ames Room: Room constructed so that from a
particular point of view, lines of windows line up and make
it look like the two people are at the same
distance – in fact the smaller subject is standing much farther away.
• Ponzo Illusion: Lines are of equal length, but since background lines converge in linear perspective,
we believe the top line is farther away – and must be bigger in actuality.
• Leaning Tower Illusion: Images of the leaning tower are
identical, but image on right seems to lean more to the right. This is
due to adjustment for expected linear perspective, where building
tops should converge in the distance.
• Moon Illusion:Moon close to horizon seems much larger than
moon in the sky. We see objects in a horizontal direction as being
more distant and therefore as being larger than objects that are
seen in a vertical direction.
• Plank Illusion: Two people stand on a level plank. The person
on the right appears larger, but when they switch places the new person on the right also appears larger.
This is due to the angle of the camera, with the person on the right being closer.
Retinal Image Size
• A nonlinear function relates the size of an object’s retinal image to its distance from the observer: as
distance doubles, retinal size halves.
o Once the object reaches a distance of 2m, 90% of its range of variation in size has occurred.
• The decrease in the area of an object’s retinal image with increasing distance follows a similar function,
although the decrease occurs faster – there is little variation in retinal area occurs beyond 3m Page 2 of7
• If an object is familiar, its real size will be known, and this relationship can be used to estimate its
Height in the Visual Field (HVF)
• Standing on level ground (ground plane), fixating the distant
horizon (0°), distant objects are higher in the visual field, closer to 0°,
than those at our feet (90°, perpendicular to line of sight).
• The height in the visual field (HVF) of an object can be used as a
cue to depth because it varies with the tangent of visual field
position; i.e., distance from the observer.
• The HVF of a point in the scene can be derived from:
o Visual information in the retinal image
o Nonvisual info from changes in eye position as fixation
moves between objects at different distances
• HVF is also a function of the observer’s height. At distances less than 100 cm, HVF varies more rapidly
for children and cats than for adults. The HVF of a scene point at any distance is closer to a child’s line of
sight than to an adult’s.
• The utility of HVF is limited to objects in contact with a level, horizontal, ground plane – HVF is
unreliable for objects in the air, or on a slanted surface.
• Uniform texture on a surface slanted away from the observer
(texture gradient) has three image qualities that vary systematically
with depth, and can be used to estimate distance:
• 1) Width (Perspective Gradient): Separation of elements
perpendicular to the surface slant, decreases with increasing
• Linear perspective is a special example of this type of gradient,
where “elements” are lines that converge on the vanishing point.
• 2) Height (Compression Gradient): Separation of elements in the
direction of surface slant, decreases with increasing distance.
• 3)Density: Number of elements per unit area, increases with
• All three texture cues vary with distance according to a power law.
• Compression of height occurs more than compression of width –
squares in the distance look wider than high with change in aspect ratio.
• The steepness of the power functions that define the variation of texture cues with distance is
o The surface slant of the texture: texture gradients were only useful when the surface slant is in
excess of 50° from vertical (40° from horizontal)
o The observer’s height: Texture cues vary more gradually for taller observers
o Texture gradients are only reliable depth cues when elements of similar size, shape, and
spacing repeat in the scene
• The depth of field of an optical system is the distance around the point of focus in which the image
remains sharply focused. Page 3 of 7
• Beyond the depth of field, the amount of blur in the image is lawfully related to its distance from the
fixation point. Objects which are less blurred seem closer.
• Image blur is not due to decreasing acuity in the visual periphery.
• As the distance from fixation increases, areas closer to fixation blur more rapidly than those further
• Short distances blur more easily, as a bigger change in lens shape is needed for 0.5m to 1.5m, vs. for
2m to 4m.
• Pupil: A smaller aperture sharpens the image, increasing depth of field
• Image blur is limited to coarse ordinal depth judgments because:
o Depth of field is also determined by pupil size. Increasing pupil size reduces depth of field,
consequently increasing the extent of blur at any given distance.
o Humans cannot detect small differences in blur.
• Nevertheless, blur can be quite effective in influencing apparent depth and distance, as shown in tilt-
shift miniaturization with a very narrow depth of field: we see it as miniaturization because this type of
blur only happens when we are very close to the object, so the scene must be very small for this to be
•When large distances are considered, the contrast of an object can be used as a cue to depth
because it is lawfully related to the object’s distance from the observer.
•Contrast is attenuated with increasing distance because atmospheric
particles scatter light. The extent of scattering depends on the atmospheric
attenuation coefficient for the given conditions.
•Atmospheric perspective is limited to coarse ordinal depth judgments:
o Large distances are required to produce perceptible differences in contrast.
o Image contrast also varies with atmospheric conditions.
•In dull, wet weather, image contrast declines rapidly with distance.
• The ciliary muscles, which circle the lens and control its accommodation, provide nonvisual information
about the absolute distance to fixation.
• For distant focus, the ciliary muscles relax into a wide ring, allowing the tension of
the suspensory ligaments to thin the lens for distance vision.
• For near focus, the ciliary muscles contract into a small ring, releasing the
suspensory ligaments, and allowing the lens to become thicker.
• Perceptual evidence suggests that accommodation can be a coarse ordinal depth cue, but only at
distances in the range of 15 to 300cm from the observer (near point is 15cm, no accommodation occurs
• Efference Copy: A copy of the efferent signal sent to the ciliary muscles is also sent to the part of the
brain that computes depth.
• Somatosensation: The tension of the muscle is measured directly, such as with spindle proprioceptive
• For an observer moving through the world, the velocity of an image point across the retina is lawfully
related to its depth in the scene. Like for texture, this also requires elements be of equal sizes and on the
same plane. Page 4 of7
• Optic flow refers to the retinal velocity gradient created by an observer moving through the scene
towards the horizon at constant speed.
• When fixation is on the horizon, the function that describes optic flow for the ground plane is identical to