Chapter 4

READING NOTES Chapter 4: PERCEPTION OF LANGUAGE Main Points: • The study of speech sounds is called phonetics ◦ Articulatory phonetics refers to the study of how speech sounds are produced ◦ Acoustic phonetics refers to the study of the resulting speech sounds • Speech exhibits characteristics not found in other forms of auditory perception • The phenomenon of categorical perception suggests that speech is a special mode of perception • Perception of speech is influenced by the contexts in which it appears ◦ We use top – down processing to identify some sounds in context • Visual perception of language is achieved through a succession of processing levels ◦ Perception of letters in a word context is superior to perception of isolated or unrelated words • Recent models of the perception of language assume that we process information at multiple levels in an interactive way ◦ These models can account for several findings in speech perception and visual word perception The Structure of Speech • The task to categorize the sounds that they hear into one of the many classes of sounds is complex for two reasons: ◦ First, the environmental context often interferes with the speech signal ▪ Under normal listening conditions, the speech we hear competes with other stimuli for our limited processing capacity ▪ e.g.) a conversation across the room, someone's sneezing ◦ Second, the variability of the speech signal itself ▪ There is non one-to-one correspondence between the characteristics of the acoustic stimulus and the speech sound we hear ▪ Several factors influence or distort the acoustic stimulus that reaches our ears ▪ e.g.) the voice of the speaker, the rate at which the speaker is producing speech and the phonetic context Prosodic Factors ◦ Ferreira has defined prosody as “a general term that refers to the aspects of an utterance's sound that are not specific to the words themselves” ◦ Prosodic factors influence the overall meaning of an utterance ◦ We can take a given word or utterance and change the stress or intonational patter and create an entirely different meaning ◦ Prosodic factors are sometimes called suprasegmentals ▪ These aspects of speech lie over speech segments (phones) providing a kind of musical accompaniment to speech ◦ Prosodic Factors: ▪ Stress refers to the emphasis given to syllables in a sentence • Stress correctly responds closely to loudness ▪ Intonation refers to the use of pitch to signify different meanings • The pitch pattern of a sentence is called its intonational contour • In English, the intonation rises at the end of yes/ no questions (Are you coming?) but not with wh- questions (questions beginning with who, what when, where, or why) ▪ Rate refers to the speed at which speech is articulated Articulatory Phonetics ◦ The study of speech sounds is called phonetics and the more specific study of the pronunciation of speech sounds is called articulatory phonetics ◦ Speech sounds differ principally in whether the airflow is obstructed ▪ Vowels are produced by letting air flow from the lungs in an unobstructed way ▪ Consonants are produced by impeding air flow at some point ◦ Place of Articulation ▪ Some consonants, such as [b] and [p], are articulated at the lips and are called bilabial consonants ▪ Others, such as [d] and [t], are formed by placing the tongue against the alveolar ridge, these are called alveolar consonants ▪ Ones that are produced in the bak of the mouth, such as [g] and [k] are called velar consonants because the tongue is placed against the velum at the back of the mouth ◦ Manner of Articulation ▪ Stop consonants obstruct the airflow completely for a period of time, then release it • Fricatives are produced by obstructing without completely stopping the airflow, as in [f] or [s] ◦ The passage in the mouth through which air must travel becomes more narrow, and this narrowing causes some turbulence • The affricate is produced by a stoplike closure followed by the slow release characteristic of fricatives ◦ The first sounds in church, phonetically represented as [c], and judge, [j] are affricates ◦ Voicing ▪ The opening between the vocal cords is called glottis • If the cords are together, the air stream must force its way through the glottis, causing the vocal cords to vibrate • The resulting sound is called a voiced speech sound such as in [b] ▪ If cords are separated, the air is not obstructed and the sound is called a voiceless sound such as in [p] Acoustic Phonetics ◦ Spectrograms ▪ One of the most common ways of describing the acoustical energy of speech sounds is called a sound spectrogram • It is produced by presenting a sample of speech to a device known as a sound spectrograph, which consists of a set of filters that analyze the sound and then project it onto a moving belt of phosphor, producing the spectrogram • The frequency is represented on the vertical axis, the time is represented on the horizontal axis, and the intensity in terms of the darkness ◦ The dark bands re called formants, at various frequency levels ▪ Formant transitions are the large rises or drops in formant frequency that occur over short durations of time ▪ The formant's steady rate is the time during which the formant frequency is relatively stable ◦ Parallel Transmission ▪ Parallel transmission refers to the fact that different phonemes of the same syllable are encoded into the speech signal simultaneously • There is no sharp physical break between adjacent sounds in a syllable • e.g.) the [t] in tool runs into the [u], which runs into the [l] ◦ Context-Conditioned Variation ▪ Context-conditioned variation described the phenomenon that the exact spectrographic appearance of a given phone is related to (or conditioned by) the speech context ▪ Context-conditioned variation is closely related to the manner in why syllables are produced, or the manner of articulation ▪ The phenomenon of producing more than one speech sound at a given time is called coarticulation • It reveals the important point that production, like the physical signal that results from it, tends to vary with the phonetic context Summary • Speech may be described in terms of the articulatory movements needed to produce a speech sound and the acoustic properties of the sound • The acoustic structure of speech sounds is revealed by spectrographic analyses of formants, their steady rates, and formant transitions • The spectrographic pattern associated with a consonant is influenced by its vowel context and is induced by the coarticulated manner in which syllables are produced • Moreover, prosodic factors such as stress, intonation, and speech rate also contribute to the variability inherent in the speech signal Perception of Isolated Speech Segments Levels of Speech Processing • We may roughly distinguish the process of speech perception into three levels: ◦ At the auditory level, the signal is represented in terms of its frequency, intensity, and temporal attributes, as with any auditory stimulus ◦ At the phonetic level, we identify individual phones by a combination of acoustic cues, such a formant transitions ◦ At the phonological level, the phonetic segment is converted into a phoneme, and phonological rules are applied to the sound sequence Speech as a Modular System • Fodor (1983):Acognitive system is modular if it: ◦ (1) Is domain specific (that is, it is dedicated to speech processing but not to say, vision) ◦ (2) Operates on a mandatory basis ◦ (3) Is fast ◦ (4) Is unaffected by feedback • Lack of Invariance ◦ The fact that there is no one-to-one correspondence between acoustic cues and perceptual evens has been termed lack of invariance ◦ The lack of such an invariant relationship suggests that the perception of speech segments must occur through a process that is different from and presumably more complex than that of “auditory” perception ▪ Therefore, speech is a special mode of perception ◦ It appears that speech percepts are based on both invariant and context-conditioned cues ▪ e.g.) Nasal consonants [m] and [n] are distinguished from other consonants by a single bar of low-frequency energy along with a complete lack of high frequency energy; these characteristics appear to be distinctive in various vowel contexts • Categorical Perception ◦ To comprehend speech, we must impose an absolute or categorical identification on the incoming speech signal rather than simply a relative determination of the various physical characteristics of the signal ▪ That is, our job is to identify whether a sound is a [p] or a [b], not whether the frequency or the intensity is relatively high or low ◦ Categorical perception refers to a failure to discriminate speech sounds any better than you can identify them ◦ With voiced sounds [ba], the vibration occurs immediately, however, with voiceless sounds it occurs after a short delay [pa] ▪ This lag is called voice onset time The Motor Theory of Spee
