LIN228H1F 2012 – Week 8 Kochetov-1
Acoustics of English Sounds
This handout discusses only some basic information used in reading spectrograms. Additional example
spectrograms illustrating English sounds can be found in the textbook. See also Chapter 8 of Peter
Ladefoged’s ‘Course in Phonetics’ (http://www.phonetics.ucla.edu/course/chapter8/figure8.html).
A spectrogram is a representation of speech sounds showing time on the horizontal axis and frequency on
the vertical axis. Intensity is shown by the darkness of the representation (darker colouring = greater
Spectrograms of different vowels are distinguished from one another on the basis of formants. Recall that
formants are clusters of harmonics which are enhanced by the resonating properties of the vocal tract. The
configuration of the vocal tract differs for each vowel leading to different formant frequencies for each
vowel. The lowest formant is F1 and F2 give us the most information in distinguishing one vowel from
• F1 is determined by the resonating frequency of the back cavity
o It is lower for high vowels (larger back cavity, lower resonating frequency)and higher for low
vowels (smaller back cavity, higher resonating frequency).
• F2 is determined by the resonating frequency of the front cavity
o It is higher for front vowels (smaller front cavity, higher resonating frequency) and lower for
back vowels (larger front cavity, lower resonating frequency). F2 is even lower for rounded
vowels because the front cavity is increased by lip protrusion.
/i/ /ɑ/ /u/
Back cavity: large Back cavity: small Back cavity: large
Front cavity: small Front cavity: large Front cavity: large
(and increased by lip protrusion)
⇓ ⇓ ⇓
F1 F2 F3
Low F1 High F1 Low F1
High F2 Low F2 Low F2
(Articulatory speech synthesis: http://www.haskins.yale.edu/; see the last week’s handout for a discussion.)
1 LIN228H1F 2012 – Week 8 Kochetov-2
See Table 8.2 in the textbook for typical English vowel formant values.
Formants show up on spectrograms as dark, relatively horizontal bands.
deed [i] did [ɪ] dead [ɛ] dad [æ]
dude [u] dawd [dɑd]
Diphthongs are vowels that involve a change in vowel quality during their articulation. They can be
recognized on spectrograms due to the change in their formants.
• /aj/ begins with a relatively high F1 on account of the initial low vowel articulation and a relatively
low F2 on account of it being a central vowel. Then the formants move to the low F1 and high F2 of
the high front vowel /i/.
• /aw/ begins similarly as it also begins with a low vowel articulation. The formants then change with
F1 lowering in accordance with the vowel height of the /u/ portion of the diphthong and the F2
lowering as well on account of the /u/ being farther back in articulation than the central /a/.
• /ɔj/ begins with the two formants relatively low and close together and then the spread apart into the
typical _____________ F1 and ____________ F2 of the high front vowel articulation.
buy [baj] bow [baw] boy [bɔj]
2 LIN228H1F 2012 – Week 8 Kochetov-3
Fricatives are often the easiest segments to pick out of a spectrogram because of the distinctive random
noise pattern in the high frequencies.
• The labiodental and dental fricatives are generally weak with a much lower intensity than the
sibilants. The sibilants /s, z, ʃ, ʒ/ have greater intensity.
• The alveolars /s/ and /z/ have random noise pattern visible in the 4000 to 8000 Hz range with the
postalveolars having somewhat lower frequencies in the 2000-6000 Hz range.
/h/ has been described in this course both as a voiceless fricative and as a voiceless vowel. It has weak
formant patterns that we associate with the resonant properties of vowels and it has weak random noise
patterns as fricatives do. It shows up on spectrograms as a noise pattern in the formant frequencies of the
See the spectrograms below for examples of [f] and [h]. See last week’s handout(Appendix) for
spectrograms of sibilant fricatives.
fur [fɚ] hay [hej]
During stops no air passes through the vocal tract as there is complete closure. Stops thus show up as a gap
on a spectrogram with no apparent sound. Clues to the place of articulation of stops are found in the
transitions to the surrounding vowels.
• Labial consonants lower F2 of surrounding vowels.
• Alveolar consonants have relatively level formants in the vowel transitions (with some variation
depending on vowel backness/frontness).
• Velar consonants bring raise F2 and lower F3 bringing F2 and F3 together.
Voiceless stops can be distinguished from voiced stops due to the presence of a low frequency voice bar in
the voiced stops. This can, however, be very faint. Aspirated voiceless stops end in a burst of high-
3 LIN228H1F 2012 – Week 8 Kochetov-4
bib [bɪb] did [dɪd]
Affricates appear as a stop portion, visible as a gap, and a fricative portion, visible as high frequency
Nasals are voiced and have a complete closure in the oral cavity with airflow only escaping through the
nose. Nasals have weak formant patterns with some energy apparent around 500 Hz and some energy
visible also at 2500-3000 Hz.
Like other stops, place of articulation on nasals is most apparent in the transitions from neighbouring
vowels. F2 is lowered before a labial nasal and after a labial nasal.F2 is level adjacent to an alveolar nasal.
And F2 is raised adjacent to a velar nasal with F3 being lowered and brought close to F2 before a velar.