LING 15 Lecture Notes - Lecture 9: Audio Signal, Spectrogram, Speech Recognition
![](https://new-preview-html.oneclass.com/17vWDzZOJA5gQMkpKEJojxyMEbRYVrnP/bg1.png)
Technology and language
Modeling language behavior
➔Speak commands instead of typing/mouse
➔Computer response with synthetic speech
Automating language analysis
Other benefits
➔Search applications, research, error detection
Relationship among symbols
➔Info → meaning → propositions → words → glyphs and phonemes → segments →
acoustics
➔All human language is based around acoustics (physical substance of phonemes)
◆We are able to infer segments → phonemes → words → prepositions → words
from acoustics
➔Speech synthesis : Computer starts with a proposition and breaks it down into acoustics
Models
Model: some artificial construction which performs what the real thing does
➔computers cannot laugh but they can change propositions into acoustics
➔In speech synthesis, a computer is modeling speech
◆Computer can come up with words and generate human-like ordering of words
Text-to-speech: process in which computer takes text input and turns it into auditory (acoustic)
signal
➔Process of converting words and/or text to auditory output
➔Two approaches [Both require stored data (program “looks up ” what sounds to make)]
◆Whole word: uses stored recording of each word and replays when needed
●Tell computer to pronounce “Hello world” → computer looks up acoustic
pronunciation of hello and of world → plays recordings in order
○ computer is told to pronounce a word → computer looks up
recording in database → plays recording of sound
● Problems:
○ Need multiple versions of each word
○ Requires a lot of storage space
◆Phonemic: determines phoneme order for each word
●“Hello world” converted into a string of phonemes → look up acoustic
pronunciation of each phoneme → string objects together and play
● Advantages:
○ Can predict phoneme string from spelling
○ Smaller number of recordings = needs less storage space
● Problems:
○ Imperfect mapping: physical segments differ by context
◆Sounds different depending on vowel and consonants so it
is hard to make the words sound natural and flow
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Info meaning propositions words glyphs and phonemes segments acoustics. All human language is based around acoustics (physical substance of phonemes) We are able to infer segments phonemes words prepositions words from acoustics. Speech synthesis : computer starts with a proposition and breaks it down into acoustics. Model: some artificial construction which performs what the real thing does. Computers cannot laugh but they can change propositions into acoustics. In speech synthesis, a computer is modeling speech. Computer can come up with words and generate human-like ordering of words. Text-to-speech: process in which computer takes text input and turns it into auditory (acoustic) signal. Process of converting words and/or text to auditory output. Two approaches [both require stored data (program looks up what sounds to make)] Whole word: uses stored recording of each word and replays when needed. Tell computer to pronounce hello world computer looks up acoustic pronunciation of hello and of world plays recordings in order.