LING 15 Lecture Notes - Lecture 9: Audio Signal, Spectrogram, Speech Recognition

43 views4 pages

coralcockroach626

29 Jun 2018

School

Department

Course

Professor

For unlimited access to Class Notes, a Class+ subscription is required.

Technology and language

Modeling language behavior

➔Speak commands instead of typing/mouse

➔Computer response with synthetic speech

Automating language analysis

Other benefits

➔Search applications, research, error detection

Relationship among symbols

➔Info → meaning → propositions → words → glyphs and phonemes → segments →

acoustics

➔All human language is based around acoustics (physical substance of phonemes)

◆We are able to infer segments → phonemes → words → prepositions → words

from acoustics

➔Speech synthesis : Computer starts with a proposition and breaks it down into acoustics

Models

Model: some artificial construction which performs what the real thing does

➔computers cannot laugh but they can change propositions into acoustics

➔In speech synthesis, a computer is modeling speech

◆Computer can come up with words and generate human-like ordering of words

Text-to-speech: process in which computer takes text input and turns it into auditory (acoustic)

signal

➔Process of converting words and/or text to auditory output

➔Two approaches [Both require stored data (program “looks up ” what sounds to make)]

◆Whole word: uses stored recording of each word and replays when needed

●Tell computer to pronounce “Hello world” → computer looks up acoustic

pronunciation of hello and of world → plays recordings in order

○ computer is told to pronounce a word → computer looks up

recording in database → plays recording of sound

● Problems:

○ Need multiple versions of each word

○ Requires a lot of storage space

◆Phonemic: determines phoneme order for each word

●“Hello world” converted into a string of phonemes → look up acoustic

pronunciation of each phoneme → string objects together and play

● Advantages:

○ Can predict phoneme string from spelling

○ Smaller number of recordings = needs less storage space

● Problems:

○ Imperfect mapping: physical segments differ by context

◆Sounds different depending on vowel and consonants so it

is hard to make the words sound natural and flow

find more resources at oneclass.com

Unlock document

This preview shows page 1 of the document.
Unlock all 4 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Info meaning propositions words glyphs and phonemes segments acoustics. All human language is based around acoustics (physical substance of phonemes) We are able to infer segments phonemes words prepositions words from acoustics. Speech synthesis : computer starts with a proposition and breaks it down into acoustics. Model: some artificial construction which performs what the real thing does. Computers cannot laugh but they can change propositions into acoustics. In speech synthesis, a computer is modeling speech. Computer can come up with words and generate human-like ordering of words. Text-to-speech: process in which computer takes text input and turns it into auditory (acoustic) signal. Process of converting words and/or text to auditory output. Two approaches [both require stored data (program looks up what sounds to make)] Whole word: uses stored recording of each word and replays when needed. Tell computer to pronounce hello world computer looks up acoustic pronunciation of hello and of world plays recordings in order.

LING 15 Lecture Notes - Lecture 9: Audio Signal, Spectrogram, Speech Recognition

Document Summary

Get access

Related Documents

LING 15 Lecture Notes - Lecture 17: Phoneme, Speech Synthesis, Total Synthesis

LING 15 Lecture Notes - Lecture 22: Error Detection And Correction, Phoneme, Modeling Language