Textbook Notes (280,000)
CA (160,000)
UTSC (20,000)
Psychology (10,000)
PSYB51H3 (300)
Chapter 11

PSYB51H3 Chapter Notes - Chapter 11: Spectrogram, Depth Perception, Speech Perception

Course Code
Matthias Niemeier

of 6
Ch. 11- Music and Speech Perception
high levels of neurotransmitters serotonin are responsible for negative aspects of emotion and
mood--> disagreeable music increases this
listening to pleasurable music= changes in heart rate, muscle electrical activity, increased blood
flow in brain regions that are thought to be involved in reward and motivation= promotive
positivity, reduce pain, alleviate stress, improve resistance to disease
pitch= psychological aspect of sound related to perceived frequency
octave= intervel between two sound frequencies having a ratio of 2: 1
when one of two periodic sounds is double the frequency of the other, those two sounds are
on octave apart= sound more similar to each other than to a sound with a closer frequency!
Musical pitch is typically described as having two dimensions
tone height= a sound quality corresponding to the level of pitch. Tone height is monotonically
related to frequency.
Tone chroma= related to octave= a sound quality shared by tones that have the same octave
helix analogy= tone height and tone frequency increase as you get higher. Circular curves=
tone chroma= same point along each lap around the helix, a sound lies on the vertical line
and all sounds along this line share the same tone chroma and are separated by octaves
**Recall: neurons in the auditory nerve signal frequency both by their location in the
cochlea (place) and by the timing of their firing (temporal). Frequencies greater than
5000Hz- temporal coding does not contribute to perception of pitch.
chords= combination of three or more musical notes with different pitches played
chords can be consonant (more pleasuring, simple ratios between the note frequencies)
Ex. octave (2:1), perfect fifth (3:2), perfect fourth (4:3)
or dissonant
Ex. minor second (16:15) or augmented fourth (45:32)
since chords are defined by the ratios of the note frequencies combined to produce them=
named the same not matter what octave they're played in
Ex. G minor= G, B and D regardless of what octave
Cultural Differences
heptatonic/pentatonic (Asian, blues, gospel) scale
Asian languages= singsongy= Mandarin= tone languages use changes in voice pitch
(fundamental frequency) to distinguish different words
language can influence use of musical scales= changes in pitch direction are larger and more
frequent in spoken tone languages such as mandarin when compared to
english/french/german and pitch changes in asian music (pentatonic scale) or larger and
occur more often as well
in scales in which octaves contain fewer notes, the notes may be more loosely tuned than
are notes in the heptatonic Western scale= wider range of pitches could qualify for a given
note= inc. Variation in a notes acceptable frequencies
Making Music
melody= a sequence of notes/chords perceived as a single coherent structure= defined by its
contour (pattern of rises/declines in pitch)= as opposed to the exact sequence of sound
Ex. shift every note in a melody by one octave= same melody
tempo= perceived speed of the presentation of sounds= defined by average duration of a set of
notes in a melody= same melody can be played at a fast/slow tempo
rhythm= listener's predisposed to grouping sounds into rhythmic patterns= accented/stressed
vs. Unaccented/unstressed
sounds that are longer, louder and higher= likely to be heard as leading their group
syncopation= any deviation from a regular rhythm, for example, by accenting a note that is
expected to be unaccented or not playing a note (Rest) when a note is expected
syncopated auditory polyrhythms= two off beat rhythms, one becomes predominant and the
other takes a backseat, seemingly moving forward/ backward= rhythm is largely
melody= like rhythm= largely psychological= our experience with a particular sequence of
notes or with similar sequences that helps us perceive coherence= ability to learn melodies
begins early on in life
vocal tract= airway above the larynx used for the production of speech= includes oral and nasal
tract= largely responsible for the versatility of human sound production
unlike other animals, human larynx is positioned quite low in the throat (leading to easier
choking + inability to swallow and breathe at the same time after infancy) but limitations were
outweighed evolutionarily by ability to speak/communicate
Speech Production
three basic components: lungs (respiration), phonation (vocal cords) and articulation (vocal
tract)= speaking fluently requires coordination of these three aspects
Phonation= process through which vocal folds are made to vibrate when air is pushed out of
the lungs = makes a buzz
rate at which vocal folds vibrate depends on their stiffness and mass= become stiffer and
vibrate faster as their tension increases--> sounds with higher pitcher= small vocal folds
vibrate faster, leading to higher pitched voices in children= adult men have lower pitched
voices because testosterone during puberty increases the mass of the vocal folds= these
manipulations lead to phonation
by varying tension of vocal folds (stiffness) and the pressure of airflow from the lungs,
individual talkers can vary the fundamental frequency of voiced sounds
vibration of vocal folds creates a harmonic spectrum (sounds like a buzz)= first
harmonic corresponds to the actual rate of physical vibration of the folds (fundamental
Articulation= the act or manner of producing a speech sound using the vocal tract
area above the larynx= oral tract and nasal tract combined= the vocal tract= humans have
the ability to change the shape of the vocal tract by manipulating the jaw, lips, tongue
body/tip, velum (soft palate) etc.= these manipulations lead to articulation
resonance characteristics= changing the size/shape of the space through which sound
passes increases/decreases energy at different frequencies= spectra of speech sounds are
shaped by the way people configure their tracts as resonators
formants= a resonance of the vocal tract. Formants are specified by their center frequency
and are denoted by integers that increase with relative frequency= labelled from lowest to
highest frequency= theses concentrations in energy occur at different frequencies,
depending on the length of the vocal tract
for shorter vocal tracts (children/short adults)= formants are at higher frequencies than
for longer vocal tracts
because absolute frequencies change depending on who's talking, listeners use the
relationship between formant peaks to perceive speech sounds= for the most part we can
distinguish almost all speech sounds on the basis of energy in the region of these lowest
three formants= however additional formants do exist, at higher frequencies with lower
**distinctive characteristics of speech= SPECTRA CHANGE over time= third dimension
(time) in addition to the dimensions of frequency and intensity/amplitude
spectogram= in sound analysis, a 3D display that plots time on the horizontal axis,
frequency on the vertical axis and amplitude/intensity on a color or gray scale
formants show up clearly in spectograms as bands of acoustic energy that undulate up
and down, depending on the speech sounds being produced
Classifying Speech Sounds
most often described in terms of articulation
vowel sounds made with a relatively open vocal tract= vary in how high/low or how far
forward/back the tongue is placed
consonants are produced by obstructing the vocal tract in some way= each can be classified
according to 3 articulatory dimensions
place of articulation= airflow can be obstructed
at the lips (b, p, m)
at the alveolar ridge just behind the teeth (d, t, n)
at the soft palate (g, k, ng)
manner of articulation= airflow can be
totally obstructed
partially obstructed
only slightly obstructed
first blocked and then allowed to sneak through (ch)
blocked at first from going through the mouth, but allowed to go through the nasal
passage (m, n)
voicing= the vocal cords may be
vibrating (voiced consonants: b, m, l= when finger on your larynx you will feel the
consonants vibrating)
not vibrating (voiceless consonants= p, s, ch)
most languages use fewer consonants and vowels than English
most languages have developed over generations to include only sounds that are relatively
easy to tell apart= easily distinguishable sounds
also easily distinguishable because= all distinctions btwn vowels and consonants are
signalled with multiple differences between sounds= so more than one acoustic property is
used to tell two sounds apart= redundancy= can still perceive speech nearly perfectly if all
energy above and below 1800 Hz is taken away