Psych 1XX3 – Music Perception Notes – Mar 18, 2010
When you hear music, you perceive it as an organized whole. This organized
whole can form an acoustic pattern that is so salient that you can hum the tune
after hearing it only once.
The acoustic pattern is easily recognizable, even if it's played in a different key or
with different instruments.
This suggests that what's important to the perception of this pattern is the relation
between the notes, and not the individual notes themselves.
Auditory Scene Analysis:
Gestalt Principles and Auditory Analysis:
How are we able to organize our auditory world so easily? The same Gestalt
principles used to organize a visual scene also apply to organizing an auditory
Figure Ground Principle:
For example, incoming stream of sounds are separated into figure and ground.
We can consider the “ground” (or background) to be whatever sounds you’re not
focusing on, such as the random sounds of the subway station itself and the
“figure” as the sound of a particular arriving train, or a specific voice on the
subway platform that you are paying attention to.
Keep in mind, however, that the sounds that make up the figure and ground are
not permanent, and will change as you focus your attention.
The principle of proximity organizes sounds that occur close together in time or
If you played a series of high (A) and low (B) tones both spaced apart in time you
would perceive two separate tones.
However if you played the tones closer together in time, you would hear a single
The principle of similarity allows you to group together auditory input that is
similar, such as sounds that are of a similar frequency or timbre.
This would allow you to pick out and group a series of sounds as all belonging to
one particular voice among many voices.
Continuity is the principle that you would use to follow along with one song,
even if you simultaneously heard another song playing with the same instruments.
Closure is the principle that would allow you to understand a conversation, even
if every other sound was muffled or missing. Pitch Perception:
Recall from our coverage of audition that frequency is measured in Hz, and that
the lowest frequency we can hear is 20 Hz, and the highest is about 20000 Hz.
Also, recall that sound waves enter the ear canal, vibrating the eardrum, further
amplified by the ossicles, which cause a wave in the fluid in the cochlea.
This movement of the fluid causes the hair cells along the basilar membrane to
move, sending a signal that is sent down the auditory nerve to key regions in the
brain. (See image below.)
There are two theories required to explain pitch perception along the entire
frequency range that we can hear.
Frequency theory is so named because it was thought that the entire basilar
membrane vibrates at the frequency of the incoming sound wave.
This causes impulses of the corresponding frequency to travel up the auditory
nerve, effectively allowing the brain to decipher frequency by counting the
number of neural impulses.
In accordance with the predictions of the frequency theory, physiological
evidence indicates that the hair cells on the basilar membrane do indeed vibrate
The frequency theory made perfect sense, until it was learned that axons are
incapable of firing more than 1000 impulses per second.
This would work fine if all of the sounds that were important to our reproduction
and survival were less than 1000 Hz. However we learned that humans can hear
sounds with frequencies as high as 20000 Hz.
Although a single axon cannot fire more than 1000 impulses per second, groups
of auditory nerve fibres can fire a series of impulses that as a team, can signal to
the brain the frequency of sound waves up to 5000 Hz. This is called the volley principle, and it extended the audible frequency range
for the frequency theory up to 5000 Hz. BUT, it’s still not enough to cover our
entire frequency range that reaches 20,000 Hz.
So the frequency theory of pitch perception cannot explain how we perceive
pitches between 5000 and 20,000 Hz. (See image below.)
Although the hair cells along the basilar membrane move together as the
frequency theory predicts, they in fact move as a traveling wave that forms a peak
at a particular place along the basilar membrane.
And so, the place theory of pitch perception states that the brain can decipher the
frequency of the sound wave by being tuned to the specific place of the peak of its
travelling wave along the basilar membrane. (See image below.)
Each inner hair cell has roughly 20 direct links with the brain, which would allow
the region of each inner hair cell on the basilar membrane to be represented very
specifically in the auditory cortex.
When a sound causes a wave in the basilar membrane, high frequency sounds
maximally displace the hair cells closest to the oval window, where sound
initially enters the cochlea.
On the other hand, low frequency sounds produce a wave with that peaks at the
opposite end of the cochlea. Tonotopic Representation of Pitch in A1:
This results in a tonotopic representation of pitch, and this organization is
maintained all the way to the primary auditory cortex, with neighbouring
regions of the cortex responding maximally to neighbouring frequencies.
Hair Cells Respond Maximally to One Frequency:
Although each hair cell is maximally responsive to a specific frequency, it will
still respond to a range of frequencies.
Direct evidence for tonotopic coding of pitch, and support for the place theory,
comes from animal studies which have used drugs that can damage the hair cells.
In one experiment, Stebbins and colleagues administered the drug and then tested
the monkeys' ability to perceive different frequencies of sound.
When the cochleae was later observed, they found that even brief exposure to the
drug damaged hair cells near the entrance to the cochlea at the oval window; with
longer exposure to the drug, damage to the hair cells extended toward the other
end of the basilar membrane. (See image below.)
The behavioural tests showed that monkeys with damage to the hair cells near the
oval window were unable to perceive high frequency sounds; more damage along
the basilar membrane translated into a growing inability to hear progressively
lower frequency sounds.
Taken together these results demonstrate that different frequencies are represented
at specific places along the basilar membrane, with high frequencies at the
entrance of the cochlea and lower frequencies at the opposite end of the cochlea.
Problems w/ Place Theory:
These trends are generally true, but a problem with the place theory is that as the
frequency of the sound gets lower the location of the peak of the wave along the
basilar membrane gets more and more variable.
For very low frequencies under 50 Hz, the peak actually disappears completely.
So the Place Theory alone also cannot account for the full audible range of
hearing; it turns out that both the frequency and place theories are needed to
explain the full range of hearing.
Frequency theory is useful to explain how we hear low frequencies that are below
1000 Hz and place theory explains how we perceive high frequencies above 5000
Both mechanisms are theoretically used for frequencies between 1000 to 5000 Hz
which is coincidentally the range of frequencies that we discriminate most
effectively. Bird Song:
Def’n of Bird Song: the music-like vocalizations that are made primarily by the
male of a species during the breeding season in order to attract a female or defend
his territory from other males.
High Vocal Centre and Robust Nucleus:
Songbirds have evolved two key brain regions to deal with the complexities of
producing song: the high vocal center (HVC) and the robust nucl