PSYC 212 Lecture Notes - Lecture 15: Inverse-Square Law, Azimuth, Trachea
Auditory distance perception
Problem: ITDs and ILDs can't tell us how far away an object
is on a given azimuth
Ex.: for an azimuth of -60 degrees, ITD will always
be -480 ms even if in absolute it takes more time for a
sound to travel from a farther distance
□
10 meters: left ear = 400 ms, right ear = 880 ms
□
100 meters: left ear = 4400 ms, right ear = 4880 ms
□
§
Inverse square law: the intensity of sound decreases as a
function of the inverse (i.e. "one divided by" of the square
of the distance)
Intensity (at distance "t") = intensity
(source)/distcance2
□
Sound decreases with distance
□
It's harder to tell small differences in distance
between two objects if they are far away than if they
are close
□
Longer distance will decrease more slowly
□
Intensity of sound as a determinant for distance, is
more useful for short distances
Longer distances won't be precise
®
□
Problem: how can I tell if a source is loud and far
away vs close but quiet?
□
Solution: the spectral composition of sounds changes
with distance
In general, long wavelengths are always more
resistant to obstacles
®
Sources that are far away are likely to have
encountered more obstacles
®
Air also has "sound-absorbing" qualities
®
Therefore the intensity of higher frequencies
decreases as a function of distance (note that
this only starts to have a real impact for
distances greater than 1000 meters)
Solutions: distal sounds are more
reverberated than direct energy
Detecting reverberated energy:
timing and change in spectral
}
◊
®
□
§
○
Cones of confusion
Problem: ITD for a sound coming from these two azimuths
is exactly the same
§
ILD for the same azimuths should also be very similar
There will be small differences because the head is
not perfectly round but it will be negligible
□
§
If you consider all three dimensions this makes a "cone"
§
Solution #1: moving your head will change the "cones of
confusion"
The only point that will retain its ITD and ILD is the
"real" source
□
By moving our heads regularly, the only intercept
would have to be the only true sound
□
It's unlikely that the same problem will occur again□
What is constant: what is probably the true sound
source
□
§
Solution #2: the pinna (and also ear canal, head and torso)
slightly distorts the amplitude of certain frequencies as a
function of elevation
Ex.: sounds coming from an elevation of -50 degrees
lose a lot of intensity (dB) between 8000 and 11000
Hz
□
Directional transfer function (DTF): a measure that
describes how the pinna, ear canal, head, and torso
change the intensity of sounds with different
frequencies that arrive at each ear from different
elevations
□
Our ears continue to change as we grow, so we have
to continuously adapt to these changes
□
We constantly learn to update this function□
§
○
Auditory stream segregation
The perceptual organization of a complex acoustic signal into
separate auditory events for which each stream is heard a
separate event
○
Segregation/grouping can be based on several acoustic cues, such
as:
The location of sounds
Previous section□
§
The frequency (pitch) of the sounds
Tones that have similar frequencies will tend to be
grouped together
□
§
The timing of the sounds
Tones that are close together in time will tend to be
grouped together
□
When we increase the distance between the tones, it
sounds like different sounds, breaking this grouping
□
§
The timbre of the sounds
Tones that have similar timbre will tend to be
grouped together
□
§
The onset of the sounds
When sounds begin at different times they appear to
be coming from different sound sources
□
§
Rule of "good continuation" (continuity effects)
In spite of interruptions, one can still "hear" a
continuous sound if the gap is filled with noise. In that
case the sound is perceived as continuing behind the
noise. However, if the gaps aren't filled with noise,
the sound is perceived as separate chunks
□
As the white noise gets louder, you begin to hear the
tone behind the white noise
□
This occurs even if it’s a tone with a "tune"□
§
Higher-order information (restoration effects)
In spite of interruptions, one can still "hear" a
sentence if the gaps are filled with noise. In this case,
higher-order semantic/syntaxic knowledge is used to
"fill the blanks". As for continuity effects, the effect
banishes if the gaps are not filled with noise
□
§
○
•
Speech
Speech
The sound of speech: phonemes
Phoneme: a unit of sound that distinguishes one word from
another in a particular language (e.g.: kill vs kiss)
To get around confusing differences between sound
and spelling, we use the International Phonetic
Alphabet (IPA)
□
About 5000 languages spoken today, utilizing over
850 different speech sounds
□
English language uses approximately 40 phonemes□
§
○
Speech production
Respiration: the diaphragm pushes air out of lungs through
the trachea, up to the larynx
1.
Phonation: the process through which vocal folds are made
to vibrate when air pushes out of the lungs
At the larynx, the air must pass through two vocal
folds (aka vocal cords)
□
More tension will cause more high-pitched sounds□
Small vocal folds, high-pitched voices
Children < women < men
The spectrum of sound passing through
the vocal folds has a harmonic spectrum
Harmonic structure: single source)
}
◊
®
□
2.
Articulation: the act or manner of producing a speech sound
using the vocal tract
The vocal tract is the area above the larynx (oral +
nasal tract)
□
Humans can change the shape of their vocal tract by
manipulating their jaws, lips, tongue body, tongue tip,
and velum (soft palate) - this is what we call
"articulation"
□
Resonance characteristics created by changing size
and shape of vocal tracts to affect sound frequency
distribution
□
Changing size and shape of vocal tracts will
increase/decrease energy at different frequencies
□
Peaks in the speech spectrum are called formants□
Formants are labelled by number, from lowest to
highest (F1, F2, F3)
□
Formants have higher frequencies for people who
have shorter vocal tracts. It is the relationship
between the formants that counts
□
Most of the time, the first three formants are
sufficient to identify the phoneme
□
The spectrum of speech sounds change over time□
Spectrograms help represent that third dimension
(time)
X: time
®
Y: frequency
®
Colour: energy (amplitude)
®
□
3.
○
•
Lecture 15
Thursday, March 1, 2018
1:01 PM
Auditory distance perception
Problem: ITDs and ILDs can't tell us how far away an object
is on a given azimuth
Ex.: for an azimuth of -60 degrees, ITD will always
be -480 ms even if in absolute it takes more time for a
sound to travel from a farther distance
□
10 meters: left ear = 400 ms, right ear = 880 ms□
100 meters: left ear = 4400 ms, right ear = 4880 ms□
§
Inverse square law: the intensity of sound decreases as a
function of the inverse (i.e. "one divided by" of the square
of the distance)
Intensity (at distance "t") = intensity
(source)/distcance2
□
Sound decreases with distance□
It's harder to tell small differences in distance
between two objects if they are far away than if they
are close
□
Longer distance will decrease more slowly□
Intensity of sound as a determinant for distance, is
more useful for short distances
Longer distances won't be precise
®
□
Problem: how can I tell if a source is loud and far
away vs close but quiet?
□
Solution: the spectral composition of sounds changes
with distance
In general, long wavelengths are always more
resistant to obstacles
®
Sources that are far away are likely to have
encountered more obstacles
®
Air also has "sound-absorbing" qualities
®
Therefore the intensity of higher frequencies
decreases as a function of distance (note that
this only starts to have a real impact for
distances greater than 1000 meters)
Solutions: distal sounds are more
reverberated than direct energy
Detecting reverberated energy:
timing and change in spectral
}
◊
®
□
§
○
Cones of confusion
Problem: ITD for a sound coming from these two azimuths
is exactly the same
§
ILD for the same azimuths should also be very similar
There will be small differences because the head is
not perfectly round but it will be negligible
□
§
If you consider all three dimensions this makes a "cone"
§
Solution #1: moving your head will change the "cones of
confusion"
The only point that will retain its ITD and ILD is the
"real" source
□
By moving our heads regularly, the only intercept
would have to be the only true sound
□
It's unlikely that the same problem will occur again
□
What is constant: what is probably the true sound
source
□
§
Solution #2: the pinna (and also ear canal, head and torso)
slightly distorts the amplitude of certain frequencies as a
function of elevation
Ex.: sounds coming from an elevation of -50 degrees
lose a lot of intensity (dB) between 8000 and 11000
Hz
□
Directional transfer function (DTF): a measure that
describes how the pinna, ear canal, head, and torso
change the intensity of sounds with different
frequencies that arrive at each ear from different
elevations
□
Our ears continue to change as we grow, so we have
to continuously adapt to these changes
□
We constantly learn to update this function
□
§
○
Auditory stream segregation
The perceptual organization of a complex acoustic signal into
separate auditory events for which each stream is heard a
separate event
○
Segregation/grouping can be based on several acoustic cues, such
as:
The location of sounds
Previous section□
§
The frequency (pitch) of the sounds
Tones that have similar frequencies will tend to be
grouped together
□
§
The timing of the sounds
Tones that are close together in time will tend to be
grouped together
□
When we increase the distance between the tones, it
sounds like different sounds, breaking this grouping
□
§
The timbre of the sounds
Tones that have similar timbre will tend to be
grouped together
□
§
The onset of the sounds
When sounds begin at different times they appear to
be coming from different sound sources
□
§
Rule of "good continuation" (continuity effects)
In spite of interruptions, one can still "hear" a
continuous sound if the gap is filled with noise. In that
case the sound is perceived as continuing behind the
noise. However, if the gaps aren't filled with noise,
the sound is perceived as separate chunks
□
As the white noise gets louder, you begin to hear the
tone behind the white noise
□
This occurs even if it’s a tone with a "tune"□
§
Higher-order information (restoration effects)
In spite of interruptions, one can still "hear" a
sentence if the gaps are filled with noise. In this case,
higher-order semantic/syntaxic knowledge is used to
"fill the blanks". As for continuity effects, the effect
banishes if the gaps are not filled with noise
□
§
○
•
Speech
Speech
The sound of speech: phonemes
Phoneme: a unit of sound that distinguishes one word from
another in a particular language (e.g.: kill vs kiss)
To get around confusing differences between sound
and spelling, we use the International Phonetic
Alphabet (IPA)
□
About 5000 languages spoken today, utilizing over
850 different speech sounds
□
English language uses approximately 40 phonemes□
§
○
Speech production
Respiration: the diaphragm pushes air out of lungs through
the trachea, up to the larynx
1.
Phonation: the process through which vocal folds are made
to vibrate when air pushes out of the lungs
At the larynx, the air must pass through two vocal
folds (aka vocal cords)
□
More tension will cause more high-pitched sounds□
Small vocal folds, high-pitched voices
Children < women < men
The spectrum of sound passing through
the vocal folds has a harmonic spectrum
Harmonic structure: single source)
}
◊
®
□
2.
Articulation: the act or manner of producing a speech sound
using the vocal tract
The vocal tract is the area above the larynx (oral +
nasal tract)
□
Humans can change the shape of their vocal tract by
manipulating their jaws, lips, tongue body, tongue tip,
and velum (soft palate) - this is what we call
"articulation"
□
Resonance characteristics created by changing size
and shape of vocal tracts to affect sound frequency
distribution
□
Changing size and shape of vocal tracts will
increase/decrease energy at different frequencies
□
Peaks in the speech spectrum are called formants□
Formants are labelled by number, from lowest to
highest (F1, F2, F3)
□
Formants have higher frequencies for people who
have shorter vocal tracts. It is the relationship
between the formants that counts
□
Most of the time, the first three formants are
sufficient to identify the phoneme
□
The spectrum of speech sounds change over time□
Spectrograms help represent that third dimension
(time)
X: time
®
Y: frequency
®
Colour: energy (amplitude)
®
□
3.
○
•
Lecture 15
Thursday, March 1, 2018 1:01 PM
Document Summary
Problem: itds and ilds can"t tell us how far away an object is on a given azimuth. : for an azimuth of -60 degrees, itd will always be -480 ms even if in absolute it takes more time for a sound to travel from a farther distance. 10 meters: left ear = 400 ms, right ear = 880 ms. 100 meters: left ear = 4400 ms, right ear = 4880 ms. Inverse square law: the intensity of sound decreases as a function of the inverse (i. e. "one divided by" of the square of the distance) It"s harder to tell small differences in distance between two objects if they are far away than if they are close. Intensity of sound as a determinant for distance, is more useful for short distances. Solution: the spectral composition of sounds changes with distance. In general, long wavelengths are always more resistant to obstacles.