Page 1 of 8
Chapter 5: Perception of Sound
• From the stream of impulses generated by the peripheral auditory system, the brain builds a
representation of perceptual attributes for each sound source. What links nerve activity and perception?
• Loudness is the perceptual attribute of a sound that corresponds most closely to its intensity
• Loudness Matching: The subject matches the intensity of a
sound (comparison stimulus) until it sounds as loud as a standard
stimulus with a fixed intensity.
• Used to investigate the frequency dependence of loudness when
the comparison and standard stimuli are not the same frequency.
E.g. Standard tone is 1kHz and 40dB SPL, subject adjusts 2kHz
tone until it appears equally loud.
• This is repeated for a range of comparison frequencies to
produce an equal-loudness contour. Each contour represents a
different standard SPL; e.g. 60 line at 1kHz is 60dB SPL, at 3kHz is closer to 50dB SPL.
• A sound with x phone is as loud as 1kHz at xdB, e.g. 20 phone as loud as 1kHz at 20dB.
• The lowest curve represents the absolute hearing threshold. Equal-loudness contours tend to follow
this curve at lower standard intensities, but flatten out at high intensities.
• Low frequency sounds must be set to a high intensity to appear equal in loudness to higher-frequency
sounds; audio reproduction equipment have a “bass boost” to compensate.
• Loudness Scaling: Subjects asked to assign numbers to sounds of different
intensities. If a standard tone is assigned as 100, and a second tone is
perceived to be twice as loud, it should be rated as 200 (magnitude
• This tells us how rapidly loudness scales with intensity. Loudness does not
increase linearly with intensity, with exponent 0.3 in Stevens’ Power Law.
• When magnitude sensation of loudness doubles, sound intensity increases
by 10dB, sound pressure increases by sqrt(10) = 3.2 times.
Models of Loudness Perception
• Excitation Pattern Model: The loudness of a sound is proportional to the summed neural activity
evoked in the auditory nerve
• Auditory nerve fibres have a relatively narrow dynamic range compared to the dynamic range of
loudness perception (60dB vs. 120dB). This is solved by different auditory nerve fibers covering different
ranges of intensity: low-threshold fibres and high-threshold fibres.
• Pitch: Perception attribute of sound that corresponds to frequency. Pitch is related to tone frequency in
pure tones, and to the fundamental in complex tones.
• Ohm’s Law: Helmholtz believed that the auditory system constructs a separate representation for each
frequency component of a complex sound. Page 2 of8
• Animals have very different ranges of audible frequencies. Bats emit very high frequencies for
echolocation, which are absorbed and reflected by objects rather than diffracted.
• Intensity decreases with the square of distance; intensity of an echo decreases exponentially by power
• Some blind humans use echolocation to walk, ride bikes, etc.! Brain imaging shows they use their
visual cortex to form a representation of space.
• Masking: A subject’s ability to detect a sound signal is impaired in the presence of noise. In an
experimental context, the noise is band-pass noise: a sound stimulus containing equal energy within a
certain band above and below its centre frequency. E.g. Center frequency 2000Hz, bandwidth of 400Hz
contains energies at all frequencies between 1800 and 2200Hz, but none outside that range.
• Ability to detect signal worsens at bandwidth increases, but above a certain bandwidth remained
constant. The auditory system likely uses band-pass filters to detect signals, so only noise frequencies
within the band of the filter affect ability to detect the signal. Thus the critical bandwidth is the
bandwidth at which detectability flattens off, and is an estimate of
the auditory filter’s bandwidth.
• Critical-Band Masking: Masking of a signal that occurs only
when the center frequency of the noise falls within a certain band
around the signal frequency.
• Masking is most effective when the mask centre frequency is
equivalent to the signal frequency; as center frequency moves
away, higher mask dB SPLs are required to mask the signal.
• Hearing is served by a band of overlapping band-pass filters,
corresponding to the tuning curves of characteristic frequency of
auditory nerve fibres.
• If the mask and the signal activate different filters, there will be no change in detectability. If they excite
the same filter, there will be attenuation.
• Frequency Discrimination: Measure a listener’s ability to detect small
changes in frequency. For example, listeners hear two tones with slightly
different frequencies and report whether the first or the second tone had a
higher pitch (two-alternative forced choice)
• A differential threshold is taken as the change in frequency required for
the listener to achieve 75% correct response (50% is just chance).
• Frequency discrimination is remarkably good at low frequencies (change
of 1Hz detectable below 1000Hz), but deteriorates at high frequencies
(100Hz for 10,000Hz).
Frequency, Pitch, Tone Height & Music
• Musical pitch and frequency are related logarithmically.
The increase in pitch of a tone is determined by the
factor with which it is multiplied, not by the amount
added. E.g. C to C’ is 200 to 260Hz, C’ to C’’ is 260 to
520Hz. One octave = x 2
• Tone Chroma: The colour of the tone, seems to repeat
each octave. Tone Height: Frequency of tones ↑ with
octaves. Page 3 of 8
• Relative Pitch perception: Identifying ratio of two frequencies in a pair of tones, e.g. major/perfect fifth
has ratio 3/2. This is made possible by training.
• Perfect Pitch Perception: Identifying absolute pitch of a single tone, e.g. that is 425Hz. This cannot be
learned, only done by 1 in 10,000.
Theories of Pitch Perception
• Neurons in the peripheral auditory system are frequency-tuned, responding to a range of frequencies
centered on its characteristic frequency.
• Place Theory (Helmholtz): Pitch is determined by the place of maximum excitation on the basilar
membrane, so each tone will activate only certain nerve fibres. Different excitation of fibres leads to
sensation of pitch.
• Timing Theory: Pitch is determined by the timing of neural impulses in the auditory nerve, as nerves
are phase-locked to the frequency of a sound wave below 4-5kHz.
• Evidence suggests that both theories are correct: pitch of low-frequency pure tones is conveyed by
timing, and the pitch of high-frequency pure tones is conveyed by the place theory.
• Complex tones contain harmonics in addition to the fundamental frequency.
• Missing Fundamental: A problem for place theory, as if the fundamental frequency of a complex tone
is removed, its pitch is still heard. E.g. Complex tone with 800, 1000, and 1200Hz tones appears to have
pitch of 200Hz.
• 1) Temporal Theory: Pitch is encoded in responses synchronized to beat frequency, which
corresponds to the fundamental frequency. The beat is present even when the fundamental is missing,
because the beat is produced by higher harmonics’ periodic variations, which corresponds to the
• Nerve firing thus follows the beat frequency, and pitch of the fundamental is perceived.
• Residue pitch is the pitch sensation produced by beats in higher unresolved harmonics.
• 2) Pattern Recognition Theory (Goldstein): The auditory system tries to find a series of harmonically
related frequencies that fit the resolved components as closely as possible; pitch is determined by the
fundamental of the best-fitting harmonic series.
• This may be by one cell, e.g. in the spiral ganglion, which samples multiple hair cells in the cochlea that
form harmonics, e.g. 200, 400, and 600Hz cells.
• Explains the missing fundamental, since the pitch of the fundamental is defined by the harmonics
present in the stimulus, even when the fundamental is not present.
• 3) Autocorrelation Theory: Temporal theory based on both resolved and unresolved beat harmonics.
The neuron spikes generated by a complex sound contain interspike intervals that correspond to the
repetition frequency of the sound.
• Autocorrelation compared a signal with a delayed representation of itself, to find the delay which
matches the repetition rate of the sound. This is achieved by having a fast line and a slow line to
coincident detector neurons, such that the signal will be delayed in the slow line.
• A separate autocorrelation is computed for each frequency channel, and the correlations are averaged
to provide a summary function used to deduce pitch.
• Place cues from place of maximum basilar membrane displacement, timing cues from temporal pattern
to individual frequencies or beats, and pattern cues from pattern of response across fibres are all used.
No single theory gives a complete account of pitch perception.
Auditory Localization Page 4 of 8
• Auditory localization refers to a listener’s ability to judge the direction and distance of a sound source,
to orient attention towards it. It can also aid in the segregation of individual sound sources from the
• Azimuth: Direction in the horizontal plane, left to right. 0° is directly
ahead, 90° is at left or right ear.
• Elevation: Direction in the medial plane, up or down. 0° is at level of
ears, 90° is from above the head.
Localization in the Horizontal Plane
• Minimum Audible Angle (MAA): Smallest change in the azimuth
angle of a sound source that can be detected. This differs for different
frequencies: below 1000 Hz, 1° can be detected, whereas detection is
much poorer between 1500 and 1800Hz.
• MAA is much smaller with azimuth of 0° than with 75° farther out in
the lateral auditory field. Why?
• At azimuth 0°, there is no ITD; at azimuth 90°, ITD is highest. One
can thus generate a curve of ITD vs. azimuth angle, which looks like a
• The smallest ITD we can resolve is 15μs. For 0°, this is a 2° MAA;
for closer to 90°, this is a 7° MAA.
• Duplex Theory: Low-frequency sounds are localized using
interaural time differences (ITDs) and high-frequency sounds are
localized using interaural le