PSYC 212 Lecture Notes - Lecture 13: Cochlea, Signify, Basilar Membrane

8 views10 pages
20 Jul 2016
Speech Perception: 22:23
Auditory Perception: How mechanical energy of air pressure is transduced by
the structures in your ear from the tempanic membrane, passed on to the
ossicles, which then pushes the oval window, creating a very special
travelling wave that mechanically has the property of peaking at pretty
much one position on the basilar membrane, representing a frequency
decomposition, where basically each position on the basilar membrane ends
up representing one frequency.
-Couple codes, lower frequencies, because of fidelity, sound is not that
great just from the mechanical wave that your cochlea has a temporal code
as well, using the phase-locking of the signal to help you figure out what the
sound frequencies are to distinguish the sounds.
-Challenging from just the vibration of the basilar membrane to figure out
what kind of stream you are listening to.
Environment has many sound sources, key thing to do anything is to be able
to separate out diff. streams, so that we can tune into one stream and
potentially disregard others and know that the info. Coming in is coming
from one source- listening to friend, not distracted by other voices around
you with similar frequencies and rythym to them.
Today- Speech perception and what a challenge it is.
Rules for stream segregation, now going into one stream, like a speech
stream, and figure out some of the ways in which your brain breaks down
the audio signals in a speech stream, into words and phoniums and how it
processes them.
Understanding speech:
People pronounce the exam same word differently- said the same word
twice, even you would not say it the same way.
Diff people have diff. voices.- diff. frequency emissions in your voice,
females have higher pitch voices.
Some noise in your production of words that makes each time diff.
Regardless of voice and accent, we can understand- people will intonate
words to convey meaning.
All of this=Highly complex frequency relations.
Frequency signals are diff. from diff. voices and yet we have no problem
categorizing words in language. There is something really challenging going
How words are segmented from a stream of speech, look at example.
EX. Of one word event being spoken independently.
-Then you have a stream, the word occurs somewhere in the stream, from
the spectrogram alone, can you see where the word is.
1. – at the beginning.
This word is synthesizedword—speech synthesizer generating it, computer.
No problem recognizing this word if you heard it.
find more resources at
find more resources at
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in
Here it is a human speaker saying the word.
The spectrograms are somewhat similar but not the same because the voice
of the person is not the same as the voice on the computer- diff. bands of
frequency energy over time fall in diff. positions- Hint is the countor to the
shape that seems to be faintly expressed here.
Really difficult.-The real point of this excersice was to show you how difficult
it is to see what is being said from just frequency and time info. That we
would get.
Recognizing words from the basilar membrane:
-Very difficult.
-The relationship between multiple frequency components is important, but
a lot of variability exists.
-Our brain overcomes this.
Basilar membrane activity/vibration- difficult to recognize words.
Multiple frequencies of a given spoken phrase- those relationships will be
diff. from one speaker to another, and yet our brain overcomes this- this
perceptual constancy for the word.
Which of the following is a natural voice recording????????????
-Which is a synthesized recording and which one is a natural recording.
Showing you the variability of just one individual saying a very similar thing
to the computer.
A is a natural speaker.
Variability for one natural speaker over time saying pretty much the same
thing- several words being repeated.
Computer designed to play out the exact same wave form- looks the same.
Human speaker- each time they present is slightly diff.
Key diff. between the two spectrograms(natural and synthesized):
Repeatability of the shape.
find more resources at
find more resources at
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in
Less noise(synthesized).
The boundaries between the words- more clear for the synthesized- sharper
boundary- another feature- the synthesized one is having pauses in the
word generation, which we again do not. That is meant to highlight how
challenging the speech production is, particularly challenging for people
learning a new language.
Cracking/breaking between words- every language has, natural speakers of
the language learn to chop up the words and overcome t but you will not see
it in the spectrogram, won’t be pauses in the spectrogram.
When does this word event occur????????????
B.- #2.
It is taken out of context- You had to guess that this bit in the sound that is
continuing onto something else, is lasing with the word next to it is being
isolated here.
THE diff. between someone who is a native speaker- could segment out this
word, someone who is not a native speaker would have a problem
separating that bit out from the rest of the word that follows.
It is hard because in this example, for native speakers, you would be able to
tell that where that word actually really ends, despite the fact that the
frequency content continues to the next word, non-native speakers would
not understand- This shows a visual example of the difference between a
native speaker and a non-native one.- segmenting the words.
Segmentation problem:
-The frequency content of speech over time has very little information for
segmenting the sound fluctuations into words.
-In normal speech, the sounds rapidly follow one another.
-The components of speech, called phonemes, are defined by frequency
-Are invariant across people and during speech.
Native- segment the word correctly and non-native would have a really hard
time segmenting the word.
To understand the speech, we need to break up the stream into words, find
where the sound frequency change and volume change is, when they signify
that another word is being spoken- the rest of the frequency content or
sound rather, is not related to the word that you just heard.
This is a serious challenge for your auditory system, the speech sounds are
falling really rapidly, and your brain has to figure out which is connected to
what- the stream segregation, assuming you are in the stream, trying to
separate out the separate words in the stream.
find more resources at
find more resources at
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in

Get access

$10 USD/m
Billed $120 USD annually
Homework Help
Class Notes
Textbook Notes
40 Verified Answers
Study Guides
1 Booster Class
$8 USD/m
Billed $96 USD annually
Homework Help
Class Notes
Textbook Notes
30 Verified Answers
Study Guides
1 Booster Class