PSYC20007 Lecture Notes - Lecture 11: Speech Recognition, Phonemic Orthography, Phonemic Awareness

29 views9 pages
14 Jun 2018
Department
Course
Professor
Lecture 11 - Tuesday 10 October 2017
PSYC20007 - COGNITIVE PSYCHOLOGY
LECTURE 11
PRINCIPLES OF WORD RECOGNITION
TODAY
1. How we recognise words in speech
2. How we recognise words in language
WORD RECOGNITION: WORDS AS WINDOWS TO MEANING
Language comprehension seems to take place automatically and effortlessly
We have no awareness of having to “do” anything
This effortlessness is based on the recognition of individual words in the speech stream, or in
print.
Words are physical objects (auditory or visual symbols) that signify the meanings of their referents
Word recognition (recognition of the physical forms of words) provides the window through
which we access meaning.
The focus of this lecture is on the recognition of individual words, either in the speech stream
(part 1) or when reading (part 2). The recognition of individual words is essentially a form of
categorisation – we recognise individual tokens (instances/exemplars of a word) produced in
speech or text as (iconic) instances of a category of similar previous experiences. The features that
distinguish one word from another are actually based on rather minimal perceptual distinctions
between speech sounds (in speech) and letters (in text). For example, the distinction between the
spoken form of the English words “laid” and “lathe” is conveyed by a subtle difference in the
articulation of the final phoneme, a distinction that may not be made in another language (e.g.,
French does not make this distinction). One of the core problems of word recognition we will
consider today is how we resolve the potential ambiguity between words that arises when such
close distinctions must be made when listening to speech (and reading). We will consider some
basic computational models of spoken and visual word recognition that implement particular
mechanisms for helping resolve such ambiguity in the input.
LEXICAL MEMORY AND LEXICAL ACCESS
Lexicon - mental dictionary
Linguistic Modalities
Auditory – Spoken – Phonology
Visual Written Orthography
Visuo-spatial Sign languages
These linguistic modalities are for perceptual word forms (arbitrary signs) that map onto concepts
(semantics).
Lexical access – The process whereby the memory for a specific word form is located, “opened”,
or “activated”.
Words are perceptual objects. This makes them powerful tools for thinking with we can
actually attend to the physical forms of words to help guide our thought processes, and we can
manipulate words as physical objects as a means for “turning ideas around” in our minds.
However, during the comprehension of spoken and written language we are often blissfully
unaware of the physical forms of words; instead, skilled language users ‘see through’ the physical
forms of words to access the meanings that they reference – in this sense, words are like windows
through which we “see” meaning (semantics). In this lecture though, we will be considering the
processes that underpin the recognition of the physical forms of words – in speech we refer to the
physical forms of words as their “phonological form ”, in writing, we refer to the “orthographic
form” (in sign languages there is a gestural, visuo- spatial form).
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 9 pages and 3 million more documents.

Already have an account? Log in
Lecture 11 - Tuesday 10 October 2017
PSYC20007 - COGNITIVE PSYCHOLOGY
1. SPOKEN WORD RECOGNITION
LEARNING OBJECTIVES
Pre-lexical cues to word boundaries
Prosodic cues
Transitional probabilities
Lexical cues
The TRACE model of spoken word recognition
Lexical feedback
Parallel activation
Lateral inhibition
SPOKEN WORD RECOGNITION
There are no gaps between words when spoken fluently.
Segmenting the speech stream is a non-trivial task.
The speech waveform has no simple correspondence to the white space between printed words.
Nevertheless, spoken words are perceived as coherent, discrete, events – auditory objects.
Words in sentences are not separated from one another by clear pauses. Instead, speakers run
words together, making it difficult to tell where one word ends and another begins. Indeed, the
tendency of speakers to run words together has proven to be a formidable hurdle for the
development of speech recognition technology. Because listeners cannot count on pauses between
words, they must rely on other information in the speech signal to indicate how word boundaries
are marked. Learning which features mark word boundaries in a particular language seems to
involve discovering the way that sounds within words are typically ordered—phonetically and
prosodically. These patterns differ from one language to another. Consequently, learners must
discover the sound properties that are the most useful cues to word boundaries in their native
language. We will define and discuss prosodic and phonetic cues to word boundaries over the next
few slides.
The ability to segment the speech stream despite “noisy”(i.e., imperfect, variable) input and great
variability between speakers begins early in development.
Without any obvious boundaries between spoken words, how can an infant discover where one
word ends and another begins?
No one instance of any one word sounds like any other instance.
PROSODIC CUES TO WORD BOUNDARIES
Infants tune into regularities in the stress- patterns of their native language.
The stress pattern (metrical rhythm) of a language is one element of the prosody of the
language.
About 90% of English multi-syllabic words stress the first syllable.
Pencil, stapler, trampoline
This strong–weak (trochaic) pattern is the opposite to that used in languages such as Polish, in
which a weak–strong (iambic) pattern predominates.
English includes some iambic words (e.g., guitar, disgust)
All languages contain words of both kinds, but one pattern typically predominates.
The dominant stress pattern (i.e., metrical pattern, or rhythm) within a language provides a
source of information (a statistical regularity) in the speech signal that assists listeners to find the
boundaries between words. However, there is a potential paradox with an account such as this.
How can a learner identify the predominant stress pattern of words in the native language without
already having some ability to segment fluent speech? One possible explanation is that learners
develop a bias for the predominant stress patterns of native language words on the basis of words
that they hear frequently spoken in isolation. For example, names in English are most likely to
begin with strong syllables, and those which do not often have nickname forms which do. English
learners have been shown to have some ability to recognize the sound patterns of their own names
by 4.5 months of age. In addition, diminutive terms that are used frequently in addressing infants !
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 9 pages and 3 million more documents.

Already have an account? Log in
Lecture 11 - Tuesday 10 October 2017
PSYC20007 - COGNITIVE PSYCHOLOGY
!
often have strong/weak stress patterns (e.g., ‘‘mommy,’’ ‘‘daddy,’’ ‘‘baby,’’ ‘‘doggie,’’ ‘‘kitty,’’
‘‘birdie,’’ etc.) If this account is correct, then in other languages, names and diminutives may also
model the predominant word patterns.
Stress provides a prosodic cue to help infants identify potential words within the speech stream
At 7.5 months of age, English-learning infants can segment words from speech that reflect the
strong–weak pattern, but not the weak–strong pattern
For example, when infants hear a phrase like ‘guitar is’ they perceive ‘taris’ as the word-like
unit
Because guitar begins with an unstressed syllable, and ‘is’ is a weak syllable.
Results based on measures of preferential looking time.
Jusczyk, et al., 1999
From the abstract of Jusczyk et al., 1999. “A series of 15 experiments was conducted to explore
English-learning infants’ capacities to segment bisyllabic words from fluent speech. The studies in
Part I focused on 7.5 month olds’ abilities to segment words with strong/weak stress patterns
from fluent speech. The infants demonstrated an ability to detect strong/weak target words in
sentential contexts. Moreover, the findings indicated that the infants were responding to the whole
words and not to just their strong syllables. In Part II, a parallel series of studies was conducted
examining 7.5 month olds’ abilities to segment words with weak/strong stress patterns. In
contrast with the results for strong/weak words, 7.5 month olds appeared to missegment weak/
strong words. They demonstrated a tendency to treat strong syllables as markers of word onsets.
In addition, when weak/strong words co-occurred with a particular following weak syllable (e.g.,
‘‘guitar is’’), 7.5 month olds appeared to misperceive these as strong/weak words (e.g., ‘‘taris’’).
The studies in Part III examined the abilities of 10.5 month olds to segment weak/strong words
from fluent speech. These older infants were able to segment weak/strong words correctly from
the various contexts in which they appeared. Overall, the findings suggest that English learners
may rely heavily on stress cues when they begin to segment words from fluent speech. However,
within a few months, infants learn to integrate multiple sources of information about the likely
boundaries of words in fluent speech.”
Experiment 11 (page 188 of the paper) describes the “guitar is” experiment. The infants were
familiarised with passages containing weak/strong bisyllabic words, such as guitar, that were
always followed by the same weak monosyllabic word (e.g., the word “is” in the “guitar is”
passage). The passage for Guitar went like this: ‘Your guitar is really a fine instrument. But she
says that the old guitar is great. In the attic, her guitar is hidden away. My pink guitar is not
nearly as special. I think that a red guitar is better looking. But if you want to play, a plain guitar
is fine.”
After being familiarised with passages like this, the infants showed a tendency to have learned the
stong/weak combination “taris” preferentially to the weak/strong “guitar”. This was determined
by the infants preferential looking time – that is, the time that the infants spent looking towards a
speaker that played both the weak/strong words (guitar) and the strong/weak nonwords (taris).
Subsequent experiments showed that older children (10.5 months) were able to overcome this
tendency and make the correct segmentations (preferring guitar to taris).
→ Jusczyk et al., 1999
‘Your guitar is really a fine instrument. But she says that the old guitar is great. In the attic, her
guitar is hidden away. My pink guitar is not nearly as special. I think that a red guitar is better
looking. But if you want to play, a plain guitar is fine.”
TRANSITIONAL PROBABILITIES
The likelihood (probability) that any given syllable follows another differs within words, and
across word boundaries.
Consider the phrase ‘pretty baby’.
Among English words, the probability that ‘TY’ will follow ‘PRE’ is higher than the
probability that ‘BAY’ will follow ‘TY’
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 9 pages and 3 million more documents.

Already have an account? Log in

Document Summary

The recognition of individual words is essentially a form of categorisation we recognise individual tokens (instances/exemplars of a word) produced in speech or text as (iconic) instances of a category of similar previous experiences. The features that distinguish one word from another are actually based on rather minimal perceptual distinctions between speech sounds (in speech) and letters (in text). For example, the distinction between the spoken form of the english words laid and lathe is conveyed by a subtle difference in the articulation of the final phoneme, a distinction that may not be made in another language (e. g. , One of the core problems of word recognition we will consider today is how we resolve the potential ambiguity between words that arises when such close distinctions must be made when listening to speech (and reading). We will consider some basic computational models of spoken and visual word recognition that implement particular mechanisms for helping resolve such ambiguity in the input.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents