CIS 2050 Lecture Notes - Lecture 8: Turing Test, Spoken Language, Traditional Animation

71 views15 pages

rubycheetah358

2 May 2018

School

University of Guelph

Department

Computing and Information Science

Course

CIS 2050

Professor

David Chiu

For unlimited access to Class Notes, a Class+ subscription is required.

Identify and describe some of the approaches used

in digital book and computer animation

Examine and evaluate the design and presentation

using multimedia techniques

Describe an interactive technique based on speech

recognition, its nature, and how it was developed

Learning Outcomes:

It may even appear in daily products in the

form of digital books and movies

○

It may use user interactivities such as speech

recognition

○

Despite the attractiveness and impact of

using multimedia, the key is still the

presentation of the intended message using

these media

○

Multimedia appeals to more of our senses than the

traditional computing

•

Computer animation is to take static objects

and give them "life" through their

movements and personality, and illusory

devices that take advantage of a sequence of

animated events

○

Multimedia computing not only involves

algorithms that connect and process different types

of data, but also involves the creative use of

technologies in delivering a "story"

•

Computer animation may involve many

steps, using techniques from modeling to

specific rendering

○

Augmented reality combines the real world with

computer-generated virtual imagery of events

•

A speech sequence is often segmented and

its parts are then recognized into their

appropriate words using computer

algorithms

○

There are still many difficulties in

recognizing human speech using the

computer, especially when the speech

consists of spoken works from people of

different cultures

○

The research on speech recognition was motivated

by the coding technology of speech signals

•

Key Points:

Insights gained from the speech recognition

advances over the past 40 years are

explored, originated from generations of

Carnegie Mellon University's R&D

○

Several major achievements over the years

have proven to work well in practice for

leading industry speech recognition systems

from Apple to Microsoft

○

It will help bridge the gap between

humans and machines

It will facilitate and enhance natural

conservation among people

6 challenges need to be addressed

before we can realize with audacious

dream

Speech recognition will mass the Turing

Test and bring the vision of Star-Trek-like

mobile devices to reality

○

Key Insights:

•

In 1976, one of the authors (Reddy) wrote a

comprehensive review of the state of the art

of voice recognition at that time

○

With the introduction of Apple's Siri and similar

voice search services from Google and Microsoft,

it is natural to wonder why voice recognition

technology took so long to advance to this level

•

Reddy predicted it would be possible to a

build a $20,000 connected speech system

within the next 10 years in 1976

○

Although it took longer than projected, the

system costs were just less and continues to

drop dramatically

○

Speech recognition has been a staple in science

fiction for years, but in 1976 the real-world

capabilities bore little resemblance to the far-

fetched capabilities in the fictional realm

•

Although it was commercially

successful, the "speech in" and "screen

out" multimodal metaphor is more

natural for information consumption

In 1999 the VoiceXML forum was created

to support telephony IVR

○

Illustrated a vision on speech-enables

multimobile devices

In 2001, Bill Gates demonstrated such a

prototype codenamed MiPad at CES

○

The speech community is en route to

passing the Turing Test in the next 40

years with the ultimate goal to match

an exceed a human's speech

recognition capability for everyday

scenarios

We are now witnessing the ever-improved

ability of devices to handle relatively

unrestricted multimodal dialogues

○

1995, Microsoft SAPI was first shipped in

Windows 95 to enable application developers to

create speech applications on Windows

•

Statistical modeling and machine learning

○

Training data and computing resources

○

Vocabulary size and dis-fluent speech

○

Speaker independent and adaptive speech

recognition

○

Efficient decoder

○

Spoken language and understanding

dialogue

○

What we did not know how to do in 1976:

•

Acoustic

Parametric

Phonemic

Lexial

Sentence

Semantic

Six levels of knowledge:

○

In 1971, the speech recognition study group

recommended that many more sources of

knowledge be brought to bear on the problem

•

Understanding Research (SUR) project

○

Hearsay

Dragon

Harpy

Sphinx I/II

Developed sequence of speech recognition

system

○

Ex. Voice control of a robot, large-

vocabulary connected-speech

recognition, speaker-independent

speech recognition and unrestricted

vocabulary dictation

Created several historic demonstrations of

spoken language systems

○

Hearsay-I was one of the first systems

capable of continuous speech recognition

○

Dragon system was one of the first to model

speech as hidden stochastic process

○

Harpy system introduced concept of Beam

Search, which was the most widely used

technique for efficient searching and

matching for decades

○

*speech recognition word error

rate has been used as the main

metric to evaluate progress

□

Sphinx-II (1992) benefited largely

from tied parameters to balance

trainability and efficiency, which

achieved the highest recognition

accuracy in DARPA-funded speech

benchmark evaluation

Sphinx-I (1987) was first system to

demonstrate speaker-independent speech

recognition

○

The word error rate was approaching a new

milestone by both Microsoft and IBM

researchers following the deep learning

framework pioneered by researchers at the

University of Toronto and Microsoft

○

By 1976, Reddy was leading a group at Carnegie

Mellon University to explore ideas of Advanced

Research Project Agency (DARPA)-sponsored

Speech

•

Architecture of the Hearsay system was

designed so that many semiautonomous

modules can communicate and cooperate in

a speech recognition task while each

concentrated on its own area of expertise

○

The Dragon, Harpy and Sphinx systems all

were based on a single, relatively single

modeling principle of joint global

optimization

○

It was anticipated in the early 1970s that to bear

the higher-level sources of knowledge might

require significant breakthroughs in artificial

intelligence

•

Decoding process in a speech recognizer's

operation is to find a sequence of words

whose corresponding acoustic and language

models best match the input vector sequence

○

= search process

○

Graph search algorithms, which have been

explored extensively in the fields of artificial

intelligence, operations research and game

theory, serve as the basic foundation for the

search problem in speech recognition

○

Decoding process of finding the best matched

work sequence to match input speech is more than

just a simple pattern recognition problem, since

one faces a practically astronomical number of

word patterns to search

•

The most salient difference is not algorithms

with a lower error rate, but rather an

emphasis on simplified algorithms with a

better cost-performance trade-off

○

Long term foal was the development of a

real-time, large-vocabulary, continuous-

speech dictation system

○

Development of technology for Dragon

NaturallySpeaking may be compared with the

general development

•

Phonetic matching and word

verification are unified with word

sequence generation that depends on

the highest overall rating typically

using a context-dependent phonetic

acoustic model

Ex. Explicit segmentation and labeling of

phonetic strings is no longer necessary

○

Establishment of the statistical machine-learning

framework, supported by the availability of

computing infrastructure and massive training

data, constitutes the most significant driving force

in advancing the development of speech

recognition

•

In non-probablistic models, there is an

estimated "distance" between sound labels

based on how similar to sounds are

estimated to be

○

In probability models, an estimate is used of

the conditional probability of observing a

particular sound label as the best matching

label, conditional on the correct label being

the hypothesized label (=confusion

probability)

○

Early methods aimed to find the closest matching

sound label from a discrete set of labels

•

Model process has a learning algorithm with

a broadly applicable convergence theorem =

Expectation-Maximization (EM) algorithm

○

Hidden Markov Model (HMM) -process that is

hidden not the model

•

DNN can replace the Gaussian

mixture model directly to overcome

the inefficiency in data representation

Significant development offered learned

feature representation with introduction of

deep neural networks (DNN)

○

Combination of HMM and DNN produced

significant error reduction

○

Before 2010, mixture of HMM-based Gaussian

densities were typically used

•

Markov process -probabilities of future events

will be independent of any additional information

about the past history of the process

•

Corpora have been created, annotated and

distributed to the world-wide community by

National Institute of Standard and

Technology (NIST), Linguistic Data

Consortium (LDC), European Language

Resources Association (ELRA), and others

○

Character of the recorded speech has

progressed from limited, constrained speech

materials to huge amounts of progressively

more realistic, spontaneous speech

○

Training data and computational resources;

•

Made it possible for speech recognition to

consume the significantly improved

computational infrastructure

○

Moore's Law: doubling the amount of computation

for a given cost every 12-18 months, as well as

comparably shrinking cost of memory

•

Made it possible to create a far more

powerful language model for voice

search applications

Both Google and Big indexed the entire

Web

○

Cloud-based speech recognition made it even

more convenient to accumulate an even more

massive amount of speech

•

Magnitude as a function of frequency is

called the "spectrum" of the short window of

speech, and a sequence of such spectra over

time in a speech utterance can be visualized

as a spectrogram

○

Deep learning technology aims at

minimizing such information loss and

searching for more powerful, deep

learning-driven speech representations

from raw data

Modifications of spectrograms led to

significant improvements in the performance

of Gaussian mixture-based HMM systems

despite the loss of raw speech information

○

In 1976 acoustic features were typically a measure

of the magnitude at each set of frequencies for

each time window

•

Maximum size has increased significantly

○

Systems in 1990s tried to recognize every

word dictated and counted every words not

recognized as an error

○

It was important for the system to learn the

names and places that occurred repeatedly in

a particular user's dictation

○

Significant advances were made in statistic

learning techniques

○

Problem remains a challenge because

modeling new words is still far from

seamless

○

Vocabulary size:

•

There was still a significant gap in

performance between single-speaker,

speaker-dependent models and speaker-

independent models intended for a diverse

population

○

Key was to use more speech data from

a large number of speakers to train the

HMM-based system

Sphinx introduced a large vocabulary of

speaker-independent continuous speech

recognition

○

Adaptive learning is also applied to

accommodate speaker variations and a wide

range of variable conditions for the channel,

noise and domain

○

Effective adaptive technologies enable rapid

application integration and are a key to

successful commercial deployment of

speech recognition

○

Speaker independent and adaptive systems:

•

More important has been searchable unified

graph representations that allow multiple

sources of knowledge to be incorporated

into a common probablistic framework

○

Practical decoding algorithms made possible

large-scale continuous speech recognition

○

Multiple speech streams

Multiple probability estimators

Multiple recognition systems

Multiple pass systems with increased

constraints

Non-compositional methods include:

○

Decoding techniques:

•

User utters queries on flight

information in an unrestricted

free form

□

Ex. Air Travel Information System

(ATIS)

SLY mostly relied on case grammars for

representing sets of semantic concepts

during 1970s

○

Number of techniques are used to fill frame

slots of the application domain from the

training data

○

Like acoustic and language modeling, deep

learning based on recurrent neural networks

can also significantly improve filling slots

for language understanding

○

Spoken language understanding (SLU):

•

There is no data like more data

○

Computing infrastructure

○

Unsupervised learning

○

Portability and generalizability

○

Dealing with uncertainties

○

Having Socrates' wisdom

○

Six Major Challenges:

•

In 1976 computation power available was

only adequate to perform speech recognition

on highly constrained tasks with low

branching factors (perplexity)

○

Thousands of processors and nearly

unlimited collective memory capacity

in the cloud

Power of these systems arises mainly

from their ability to collect, process

and learn from very large data sets

In 1976, faster computer available for

routine speech was a dedicated PDP-10 with

4MB memory

○

Algorithmic improvements have been

made (ex. Using distributed

algorithms for deep learning task)

Still difficult to dynamically adapt to

the speaker and environment, which

have the potential to reduce the error

rate by half

Social graph used for Web search

engines can be used to dramatically

reduce the needed search space

Mixed lingual speech makes the new

world problem more difficult

Multimodal interactive

metaphor will be a dominant

metaphor as illustrated by

MiPad demo and Apple's Siri-

like services

□

We are still missing human-like

clarification dialog for new

words previously unknown to

the system

□

Associated problem of error detection

and correction lead to difficult user

interface choices

Some systems require the use of

more powerful discrimination

learning

□

Dynamic sparse data learning is

missing in most systems

□

Recognition of highly confusable

words is still a problem

Speech recognition will help

bridge the gap between us and

machines

□

A powerful tool to facilitate and

enhance natural conservation

among people regardless of

barriers of location or language

□

Speech recognition in the next 40

years will pass the Turing test

Basic learning an decoding algorithms have

not changes substantially in the past 40 years

○

Conclusion:

•

Read: Historical perspective of speech recognition

To animate means "to give life to"

•

In computer animation, animators use

software to draw, model and animate objects

and characters in vast digital landscapes

○

An animator's job is to take a static image or

object and literally bring it to life by giving it

movement and personality

•

The animator draws objects or

characters either by hand or with a

computer

Then he positions his creations in key

frames, which form an outline of the

most important movements

This process is called tweening

□

Next, the computer uses mathematical

algorithms to fill in the "in-between"

frames

Key framing and tweening are

traditional animation techniques that

can be done by hand, but are

accomplished much faster with a

computer

Computer-assisted animation is typically

2D, like cartoons

○

This cannot be done with a pencil and

paper

Key framing and tweening are still an

important function of computer-

generated animation, but there are

other techniques that don't relate to

traditional animation

Using mathematical algorithms,

animators can program objects to

adhere to (or break) physical laws like

gravity, mass and force

Or create tremendous herds and flocks

of creatures that appear to act

independently, yet collectively

With computer-generated animation,

instead of animating each hair on a

monster's head, the monster's fur is

designed to wave gently in the wind

and lay flat when wet

Computer-generated animation is 3D,

meaning that objects and characters are

modeled on a plane with a X, Y and Z axis

○

There are two basic kinds of computer animation:

computer-assisted and computer-generated

•

Animators at Disney revolutionized the

industry with innovations like the use of

sounds in animated short films and the

multi-plane camera stand that created the

parallax effect of background depth

○

Technology has been a long part of the animator's

toolkit

•

Earliest films were scientific simulations

including "Flow of a Viscous Fluid" and

"Propagation of Shock Waves in a Solid

Form"

○

The roots of computer animation began with

computer graphics pioneers in the early 1960s

•

Utah Teapot -rendered 3D teapot that

signaled a turning point in the

photorealistic quality of 3D graphics

University of Utah was source of the earliest

important break through in 3D computer

graphics, like the hidden surface algorithm

that allows a computer to conceptualize 3D

objects

○

Ed Catmull (University of Utah) was one of the

first to toy with computer animation as art,

beginning with a 3D rendering of his hand opening

and closing

•

More films in later 1970s and early 1980s

relied on computer graphics (CG) to create

primitive effects

○

"Tron" (1982) was ideal for showcasing

undeniably digital effects

○

"Jurassic Park" (1993) was first feature film

to integrate convincingly real, entirely

computer generated characters into a lice

action film

○

"Toy Story" (1995) from Pixar was first full-

length cartoon made entirely with computer-

generated 3D animation

○

In 1972, "Westworld" became first film to contain

computer-generated 2D graphics

•

Today, a standard desktop computer runs

5000x faster than those used by computer

graphics pioneers in 1960s

○

Cost of the basic technology for creating

computer animation has dropped from

$500,000 to less than $2,000

○

Increasing sophistication and realism of 3D

animation can be directly credited to an

exponential growth in computer processing power

•

Read: How computer animation works

create 3D world inside a computer

•

Can move a camera inside a 3D world to make

characters come to life

•

Moment in lighting where all the pieces

come together and the world comes to life

○

Can use light to help tell story, set mood,

guide the audience eye, make characters

stand out…etc

○

Can add or remove lighting to place it and create

shadows -balancing reality and artistry

•

Mimic physics of water, light,

movement…etc. but not constrained by it

○

Tether ourselves with science and the world we

know as a background -create something relatable

•

Ribbons of light

○

Movement of water

○

Fog beams

○

Can change colour to depict a mood

○

Create a believable world that the audience

can immerse themselves in

○

Ex. Finding Nemo

•

Ex. Light in Walle's "binoculars" to make it

more human-like with emotions

○

Can portray emotion with light

•

Interweaving of art and science

•

Watch: Magic ingredient that brings Pixar movies to life

Augmented reality is the welding of the real world

with computer-generated imagery

•

Latest technology

•

Willing enter fictional world

○

Must suspend our disbelief

•

Deliberately exploit the way the audience thinks

•

Defy logic

•

Watch: Magical tale (with augmented reality)

Explore solutions to solve the climate crisis

○

Could see pictures or videos on a map

Can include pictures, interactive images, and

videos that can be expanded

○

Ex. Blowing into mic to see windmill

mechanism

Interactive infographics -can explore

○

Runs on ipad and iphone

○

"Our Choice" -first interactive digital book by Al

Gore

•

Watch: Next-generation digital book

Augmented reality (AR) is an enhanced

perception of a physical, real-world

environment such that some elements are

augmented and overlaid by computer-

generated sensory input, such as sounds,

images, video or computer vision

○

The perception may be in real time and may

involve human interactivity

○

Applications may include the Google glass,

realistic medical/military training, and

realistic entertainment

○

What is augmented reality, and some of its

potential uses?

Speech recognition is a collection of

computational techniques that aim at

recognizing or classifying natural human

speech

○

Usually, it processes the speech signal as

data from its coding representation,

segments the code into components,

identifies the components in their context,

and classifies the components into

interpretable human words

○

Each step of the process can present some

difficulties due to possible ambiguity,

uncertainty of the true nature, or confusion

between potential words

○

What is speech recognition and briefly discuss

some of the challenges in its development?

Tweening -technique usually using

computing to generate frames to fill in the

content between two key frames in

animation

Key frame -frame used as a reference point

in computer algorithm for smooth transition

Kinematics -study of methods in

manipulating motion of an object, usually

using a 3D model without considering the

cause of the motion

Briefly define these terms in animation:3.

Questions:

Multimedia Computing

Friday,*March*9,*2018

2:24*PM

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in

Identify and describe some of the approaches used

in digital book and computer animation

Examine and evaluate the design and presentation

using multimedia techniques

Describe an interactive technique based on speech

recognition, its nature, and how it was developed

Learning Outcomes:

It may even appear in daily products in the

form of digital books and movies

○

It may use user interactivities such as speech

recognition

○

Despite the attractiveness and impact of

using multimedia, the key is still the

presentation of the intended message using

these media

○

Multimedia appeals to more of our senses than the

traditional computing

•

Computer animation is to take static objects

and give them "life" through their

movements and personality, and illusory

devices that take advantage of a sequence of

animated events

○

Multimedia computing not only involves

algorithms that connect and process different types

of data, but also involves the creative use of

technologies in delivering a "story"

•

Computer animation may involve many

steps, using techniques from modeling to

specific rendering

○

Augmented reality combines the real world with

computer-generated virtual imagery of events

•

A speech sequence is often segmented and

its parts are then recognized into their

appropriate words using computer

algorithms

○

There are still many difficulties in

recognizing human speech using the

computer, especially when the speech

consists of spoken works from people of

different cultures

○

The research on speech recognition was motivated

by the coding technology of speech signals

•

Key Points:

Insights gained from the speech recognition

advances over the past 40 years are

explored, originated from generations of

Carnegie Mellon University's R&D

○

Several major achievements over the years

have proven to work well in practice for

leading industry speech recognition systems

from Apple to Microsoft

○

It will help bridge the gap between

humans and machines

It will facilitate and enhance natural

conservation among people

6 challenges need to be addressed

before we can realize with audacious

dream

Speech recognition will mass the Turing

Test and bring the vision of Star-Trek-like

mobile devices to reality

○

Key Insights:

•

In 1976, one of the authors (Reddy) wrote a

comprehensive review of the state of the art

of voice recognition at that time

○

With the introduction of Apple's Siri and similar

voice search services from Google and Microsoft,

it is natural to wonder why voice recognition

technology took so long to advance to this level

•

Reddy predicted it would be possible to a

build a $20,000 connected speech system

within the next 10 years in 1976

○

Although it took longer than projected, the

system costs were just less and continues to

drop dramatically

○

Speech recognition has been a staple in science

fiction for years, but in 1976 the real-world

capabilities bore little resemblance to the far-

fetched capabilities in the fictional realm

•

Although it was commercially

successful, the "speech in" and "screen

out" multimodal metaphor is more

natural for information consumption

In 1999 the VoiceXML forum was created

to support telephony IVR

○

Illustrated a vision on speech-enables

multimobile devices

In 2001, Bill Gates demonstrated such a

prototype codenamed MiPad at CES

○

The speech community is en route to

passing the Turing Test in the next 40

years with the ultimate goal to match

an exceed a human's speech

recognition capability for everyday

scenarios

We are now witnessing the ever-improved

ability of devices to handle relatively

unrestricted multimodal dialogues

○

1995, Microsoft SAPI was first shipped in

Windows 95 to enable application developers to

create speech applications on Windows

•

Statistical modeling and machine learning

○

Training data and computing resources

○

Vocabulary size and dis-fluent speech

○

Speaker independent and adaptive speech

recognition

○

Efficient decoder

○

Spoken language and understanding

dialogue

○

What we did not know how to do in 1976:

•

Acoustic

Parametric

Phonemic

Lexial

Sentence

Semantic

Six levels of knowledge:

○

In 1971, the speech recognition study group

recommended that many more sources of

knowledge be brought to bear on the problem

•

Understanding Research (SUR) project

○

Hearsay

Dragon

Harpy

Sphinx I/II

Developed sequence of speech recognition

system

○

Ex. Voice control of a robot, large-

vocabulary connected-speech

recognition, speaker-independent

speech recognition and unrestricted

vocabulary dictation

Created several historic demonstrations of

spoken language systems

○

Hearsay-I was one of the first systems

capable of continuous speech recognition

○

Dragon system was one of the first to model

speech as hidden stochastic process

○

Harpy system introduced concept of Beam

Search, which was the most widely used

technique for efficient searching and

matching for decades

○

*speech recognition word error

rate has been used as the main

metric to evaluate progress

□

Sphinx-II (1992) benefited largely

from tied parameters to balance

trainability and efficiency, which

achieved the highest recognition

accuracy in DARPA-funded speech

benchmark evaluation

Sphinx-I (1987) was first system to

demonstrate speaker-independent speech

recognition

○

The word error rate was approaching a new

milestone by both Microsoft and IBM

researchers following the deep learning

framework pioneered by researchers at the

University of Toronto and Microsoft

○

By 1976, Reddy was leading a group at Carnegie

Mellon University to explore ideas of Advanced

Research Project Agency (DARPA)-sponsored

Speech

•

Architecture of the Hearsay system was

designed so that many semiautonomous

modules can communicate and cooperate in

a speech recognition task while each

concentrated on its own area of expertise

○

The Dragon, Harpy and Sphinx systems all

were based on a single, relatively single

modeling principle of joint global

optimization

○

It was anticipated in the early 1970s that to bear

the higher-level sources of knowledge might

require significant breakthroughs in artificial

intelligence

•

Decoding process in a speech recognizer's

operation is to find a sequence of words

whose corresponding acoustic and language

models best match the input vector sequence

○

= search process

○

Graph search algorithms, which have been

explored extensively in the fields of artificial

intelligence, operations research and game

theory, serve as the basic foundation for the

search problem in speech recognition

○

Decoding process of finding the best matched

work sequence to match input speech is more than

just a simple pattern recognition problem, since

one faces a practically astronomical number of

word patterns to search

•

The most salient difference is not algorithms

with a lower error rate, but rather an

emphasis on simplified algorithms with a

better cost-performance trade-off

○

Long term foal was the development of a

real-time, large-vocabulary, continuous-

speech dictation system

○

Development of technology for Dragon

NaturallySpeaking may be compared with the

general development

•

Phonetic matching and word

verification are unified with word

sequence generation that depends on

the highest overall rating typically

using a context-dependent phonetic

acoustic model

Ex. Explicit segmentation and labeling of

phonetic strings is no longer necessary

○

Establishment of the statistical machine-learning

framework, supported by the availability of

computing infrastructure and massive training

data, constitutes the most significant driving force

in advancing the development of speech

recognition

•

In non-probablistic models, there is an

estimated "distance" between sound labels

based on how similar to sounds are

estimated to be

○

In probability models, an estimate is used of

the conditional probability of observing a

particular sound label as the best matching

label, conditional on the correct label being

the hypothesized label (=confusion

probability)

○

Early methods aimed to find the closest matching

sound label from a discrete set of labels

•

Model process has a learning algorithm with

a broadly applicable convergence theorem =

Expectation-Maximization (EM) algorithm

○

Hidden Markov Model (HMM) -process that is

hidden not the model

•

DNN can replace the Gaussian

mixture model directly to overcome

the inefficiency in data representation

Significant development offered learned

feature representation with introduction of

deep neural networks (DNN)

○

Combination of HMM and DNN produced

significant error reduction

○

Before 2010, mixture of HMM-based Gaussian

densities were typically used

•

Markov process -probabilities of future events

will be independent of any additional information

about the past history of the process

•

Corpora have been created, annotated and

distributed to the world-wide community by

National Institute of Standard and

Technology (NIST), Linguistic Data

Consortium (LDC), European Language

Resources Association (ELRA), and others

○

Character of the recorded speech has

progressed from limited, constrained speech

materials to huge amounts of progressively

more realistic, spontaneous speech

○

Training data and computational resources;

•

Made it possible for speech recognition to

consume the significantly improved

computational infrastructure

○

Moore's Law: doubling the amount of computation

for a given cost every 12-18 months, as well as

comparably shrinking cost of memory

•

Made it possible to create a far more

powerful language model for voice

search applications

Both Google and Big indexed the entire

Web

○

Cloud-based speech recognition made it even

more convenient to accumulate an even more

massive amount of speech

•

Magnitude as a function of frequency is

called the "spectrum" of the short window of

speech, and a sequence of such spectra over

time in a speech utterance can be visualized

as a spectrogram

○

Deep learning technology aims at

minimizing such information loss and

searching for more powerful, deep

learning-driven speech representations

from raw data

Modifications of spectrograms led to

significant improvements in the performance

of Gaussian mixture-based HMM systems

despite the loss of raw speech information

○

In 1976 acoustic features were typically a measure

of the magnitude at each set of frequencies for

each time window

•

Maximum size has increased significantly

○

Systems in 1990s tried to recognize every

word dictated and counted every words not

recognized as an error

○

It was important for the system to learn the

names and places that occurred repeatedly in

a particular user's dictation

○

Significant advances were made in statistic

learning techniques

○

Problem remains a challenge because

modeling new words is still far from

seamless

○

Vocabulary size:

•

There was still a significant gap in

performance between single-speaker,

speaker-dependent models and speaker-

independent models intended for a diverse

population

○

Key was to use more speech data from

a large number of speakers to train the

HMM-based system

Sphinx introduced a large vocabulary of

speaker-independent continuous speech

recognition

○

Adaptive learning is also applied to

accommodate speaker variations and a wide

range of variable conditions for the channel,

noise and domain

○

Effective adaptive technologies enable rapid

application integration and are a key to

successful commercial deployment of

speech recognition

○

Speaker independent and adaptive systems:

•

More important has been searchable unified

graph representations that allow multiple

sources of knowledge to be incorporated

into a common probablistic framework

○

Practical decoding algorithms made possible

large-scale continuous speech recognition

○

Multiple speech streams

Multiple probability estimators

Multiple recognition systems

Multiple pass systems with increased

constraints

Non-compositional methods include:

○

Decoding techniques:

•

User utters queries on flight

information in an unrestricted

free form

□

Ex. Air Travel Information System

(ATIS)

SLY mostly relied on case grammars for

representing sets of semantic concepts

during 1970s

○

Number of techniques are used to fill frame

slots of the application domain from the

training data

○

Like acoustic and language modeling, deep

learning based on recurrent neural networks

can also significantly improve filling slots

for language understanding

○

Spoken language understanding (SLU):

•

There is no data like more data

○

Computing infrastructure

○

Unsupervised learning

○

Portability and generalizability

○

Dealing with uncertainties

○

Having Socrates' wisdom

○

Six Major Challenges:

•

In 1976 computation power available was

only adequate to perform speech recognition

on highly constrained tasks with low

branching factors (perplexity)

○

Thousands of processors and nearly

unlimited collective memory capacity

in the cloud

Power of these systems arises mainly

from their ability to collect, process

and learn from very large data sets

In 1976, faster computer available for

routine speech was a dedicated PDP-10 with

4MB memory

○

Algorithmic improvements have been

made (ex. Using distributed

algorithms for deep learning task)

Still difficult to dynamically adapt to

the speaker and environment, which

have the potential to reduce the error

rate by half

Social graph used for Web search

engines can be used to dramatically

reduce the needed search space

Mixed lingual speech makes the new

world problem more difficult

Multimodal interactive

metaphor will be a dominant

metaphor as illustrated by

MiPad demo and Apple's Siri-

like services

□

We are still missing human-like

clarification dialog for new

words previously unknown to

the system

□

Associated problem of error detection

and correction lead to difficult user

interface choices

Some systems require the use of

more powerful discrimination

learning

□

Dynamic sparse data learning is

missing in most systems

□

Recognition of highly confusable

words is still a problem

Speech recognition will help

bridge the gap between us and

machines

□

A powerful tool to facilitate and

enhance natural conservation

among people regardless of

barriers of location or language

□

Speech recognition in the next 40

years will pass the Turing test

Basic learning an decoding algorithms have

not changes substantially in the past 40 years

○

Conclusion:

•

Read: Historical perspective of speech recognition

To animate means "to give life to"

•

In computer animation, animators use

software to draw, model and animate objects

and characters in vast digital landscapes

○

An animator's job is to take a static image or

object and literally bring it to life by giving it

movement and personality

•

The animator draws objects or

characters either by hand or with a

computer

Then he positions his creations in key

frames, which form an outline of the

most important movements

This process is called tweening

□

Next, the computer uses mathematical

algorithms to fill in the "in-between"

frames

Key framing and tweening are

traditional animation techniques that

can be done by hand, but are

accomplished much faster with a

computer

Computer-assisted animation is typically

2D, like cartoons

○

This cannot be done with a pencil and

paper

Key framing and tweening are still an

important function of computer-

generated animation, but there are

other techniques that don't relate to

traditional animation

Using mathematical algorithms,

animators can program objects to

adhere to (or break) physical laws like

gravity, mass and force

Or create tremendous herds and flocks

of creatures that appear to act

independently, yet collectively

With computer-generated animation,

instead of animating each hair on a

monster's head, the monster's fur is

designed to wave gently in the wind

and lay flat when wet

Computer-generated animation is 3D,

meaning that objects and characters are

modeled on a plane with a X, Y and Z axis

○

There are two basic kinds of computer animation:

computer-assisted and computer-generated

•

Animators at Disney revolutionized the

industry with innovations like the use of

sounds in animated short films and the

multi-plane camera stand that created the

parallax effect of background depth

○

Technology has been a long part of the animator's

toolkit

•

Earliest films were scientific simulations

including "Flow of a Viscous Fluid" and

"Propagation of Shock Waves in a Solid

Form"

○

The roots of computer animation began with

computer graphics pioneers in the early 1960s

•

Utah Teapot -rendered 3D teapot that

signaled a turning point in the

photorealistic quality of 3D graphics

University of Utah was source of the earliest

important break through in 3D computer

graphics, like the hidden surface algorithm

that allows a computer to conceptualize 3D

objects

○

Ed Catmull (University of Utah) was one of the

first to toy with computer animation as art,

beginning with a 3D rendering of his hand opening

and closing

•

More films in later 1970s and early 1980s

relied on computer graphics (CG) to create

primitive effects

○

"Tron" (1982) was ideal for showcasing

undeniably digital effects

○

"Jurassic Park" (1993) was first feature film

to integrate convincingly real, entirely

computer generated characters into a lice

action film

○

"Toy Story" (1995) from Pixar was first full-

length cartoon made entirely with computer-

generated 3D animation

○

In 1972, "Westworld" became first film to contain

computer-generated 2D graphics

•

Today, a standard desktop computer runs

5000x faster than those used by computer

graphics pioneers in 1960s

○

Cost of the basic technology for creating

computer animation has dropped from

$500,000 to less than $2,000

○

Increasing sophistication and realism of 3D

animation can be directly credited to an

exponential growth in computer processing power

•

Read: How computer animation works

create 3D world inside a computer

•

Can move a camera inside a 3D world to make

characters come to life

•

Moment in lighting where all the pieces

come together and the world comes to life

○

Can use light to help tell story, set mood,

guide the audience eye, make characters

stand out…etc

○

Can add or remove lighting to place it and create

shadows -balancing reality and artistry

•

Mimic physics of water, light,

movement…etc. but not constrained by it

○

Tether ourselves with science and the world we

know as a background -create something relatable

•

Ribbons of light

○

Movement of water

○

Fog beams

○

Can change colour to depict a mood

○

Create a believable world that the audience

can immerse themselves in

○

Ex. Finding Nemo

•

Ex. Light in Walle's "binoculars" to make it

more human-like with emotions

○

Can portray emotion with light

•

Interweaving of art and science

•

Watch: Magic ingredient that brings Pixar movies to life

Augmented reality is the welding of the real world

with computer-generated imagery

•

Latest technology

•

Willing enter fictional world

○

Must suspend our disbelief

•

Deliberately exploit the way the audience thinks

•

Defy logic

•

Watch: Magical tale (with augmented reality)

Explore solutions to solve the climate crisis

○

Could see pictures or videos on a map

Can include pictures, interactive images, and

videos that can be expanded

○

Ex. Blowing into mic to see windmill

mechanism

Interactive infographics -can explore

○

Runs on ipad and iphone

○

"Our Choice" -first interactive digital book by Al

Gore

•

Watch: Next-generation digital book

Augmented reality (AR) is an enhanced

perception of a physical, real-world

environment such that some elements are

augmented and overlaid by computer-

generated sensory input, such as sounds,

images, video or computer vision

○

The perception may be in real time and may

involve human interactivity

○

Applications may include the Google glass,

realistic medical/military training, and

realistic entertainment

○

What is augmented reality, and some of its

potential uses?

Speech recognition is a collection of

computational techniques that aim at

recognizing or classifying natural human

speech

○

Usually, it processes the speech signal as

data from its coding representation,

segments the code into components,

identifies the components in their context,

and classifies the components into

interpretable human words

○

Each step of the process can present some

difficulties due to possible ambiguity,

uncertainty of the true nature, or confusion

between potential words

○

What is speech recognition and briefly discuss

some of the challenges in its development?

Tweening -technique usually using

computing to generate frames to fill in the

content between two key frames in

animation

Key frame -frame used as a reference point

in computer algorithm for smooth transition

Kinematics -study of methods in

manipulating motion of an object, usually

using a 3D model without considering the

cause of the motion

Briefly define these terms in animation:3.

Questions:

Multimedia Computing

Friday,*March*9,*2018 2:24*PM

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in

Identify and describe some of the approaches used

in digital book and computer animation

Examine and evaluate the design and presentation

using multimedia techniques

Describe an interactive technique based on speech

recognition, its nature, and how it was developed

Learning Outcomes:

It may even appear in daily products in the

form of digital books and movies

○

It may use user interactivities such as speech

recognition

○

Despite the attractiveness and impact of

using multimedia, the key is still the

presentation of the intended message using

these media

○

Multimedia appeals to more of our senses than the

traditional computing

•

Computer animation is to take static objects

and give them "life" through their

movements and personality, and illusory

devices that take advantage of a sequence of

animated events

○

Multimedia computing not only involves

algorithms that connect and process different types

of data, but also involves the creative use of

technologies in delivering a "story"

•

Computer animation may involve many

steps, using techniques from modeling to

specific rendering

○

Augmented reality combines the real world with

computer-generated virtual imagery of events

•

A speech sequence is often segmented and

its parts are then recognized into their

appropriate words using computer

algorithms

○

There are still many difficulties in

recognizing human speech using the

computer, especially when the speech

consists of spoken works from people of

different cultures

○

The research on speech recognition was motivated

by the coding technology of speech signals

•

Key Points:

Insights gained from the speech recognition

advances over the past 40 years are

explored, originated from generations of

Carnegie Mellon University's R&D

○

Several major achievements over the years

have proven to work well in practice for

leading industry speech recognition systems

from Apple to Microsoft

○

It will help bridge the gap between

humans and machines

It will facilitate and enhance natural

conservation among people

6 challenges need to be addressed

before we can realize with audacious

dream

Speech recognition will mass the Turing

Test and bring the vision of Star-Trek-like

mobile devices to reality

○

Key Insights:

•

In 1976, one of the authors (Reddy) wrote a

comprehensive review of the state of the art

of voice recognition at that time

○

With the introduction of Apple's Siri and similar

voice search services from Google and Microsoft,

it is natural to wonder why voice recognition

technology took so long to advance to this level

•

Reddy predicted it would be possible to a

build a $20,000 connected speech system

within the next 10 years in 1976

○

Although it took longer than projected, the

system costs were just less and continues to

drop dramatically

○

Speech recognition has been a staple in science

fiction for years, but in 1976 the real-world

capabilities bore little resemblance to the far-

fetched capabilities in the fictional realm

•

Although it was commercially

successful, the "speech in" and "screen

out" multimodal metaphor is more

natural for information consumption

In 1999 the VoiceXML forum was created

to support telephony IVR

○

Illustrated a vision on speech-enables

multimobile devices

In 2001, Bill Gates demonstrated such a

prototype codenamed MiPad at CES

○

The speech community is en route to

passing the Turing Test in the next 40

years with the ultimate goal to match

an exceed a human's speech

recognition capability for everyday

scenarios

We are now witnessing the ever-improved

ability of devices to handle relatively

unrestricted multimodal dialogues

○

1995, Microsoft SAPI was first shipped in

Windows 95 to enable application developers to

create speech applications on Windows

•

Statistical modeling and machine learning

○

Training data and computing resources

○

Vocabulary size and dis-fluent speech

○

Speaker independent and adaptive speech

recognition

○

Efficient decoder

○

Spoken language and understanding

dialogue

○

What we did not know how to do in 1976:

•

Acoustic

Parametric

Phonemic

Lexial

Sentence

Semantic

Six levels of knowledge:

○

In 1971, the speech recognition study group

recommended that many more sources of

knowledge be brought to bear on the problem

•

Understanding Research (SUR) project

○

Hearsay

Dragon

Harpy

Sphinx I/II

Developed sequence of speech recognition

system

○

Ex. Voice control of a robot, large-

vocabulary connected-speech

recognition, speaker-independent

speech recognition and unrestricted

vocabulary dictation

Created several historic demonstrations of

spoken language systems

○

Hearsay-I was one of the first systems

capable of continuous speech recognition

○

Dragon system was one of the first to model

speech as hidden stochastic process

○

Harpy system introduced concept of Beam

Search, which was the most widely used

technique for efficient searching and

matching for decades

○

*speech recognition word error

rate has been used as the main

metric to evaluate progress

□

Sphinx-II (1992) benefited largely

from tied parameters to balance

trainability and efficiency, which

achieved the highest recognition

accuracy in DARPA-funded speech

benchmark evaluation

Sphinx-I (1987) was first system to

demonstrate speaker-independent speech

recognition

○

The word error rate was approaching a new

milestone by both Microsoft and IBM

researchers following the deep learning

framework pioneered by researchers at the

University of Toronto and Microsoft

○

By 1976, Reddy was leading a group at Carnegie

Mellon University to explore ideas of Advanced

Research Project Agency (DARPA)-sponsored

Speech

•

Architecture of the Hearsay system was

designed so that many semiautonomous

modules can communicate and cooperate in

a speech recognition task while each

concentrated on its own area of expertise

○

The Dragon, Harpy and Sphinx systems all

were based on a single, relatively single

modeling principle of joint global

optimization

○

It was anticipated in the early 1970s that to bear

the higher-level sources of knowledge might

require significant breakthroughs in artificial

intelligence

•

Decoding process in a speech recognizer's

operation is to find a sequence of words

whose corresponding acoustic and language

models best match the input vector sequence

○

= search process

○

Graph search algorithms, which have been

explored extensively in the fields of artificial

intelligence, operations research and game

theory, serve as the basic foundation for the

search problem in speech recognition

○

Decoding process of finding the best matched

work sequence to match input speech is more than

just a simple pattern recognition problem, since

one faces a practically astronomical number of

word patterns to search

•

The most salient difference is not algorithms

with a lower error rate, but rather an

emphasis on simplified algorithms with a

better cost-performance trade-off

○

Long term foal was the development of a

real-time, large-vocabulary, continuous-

speech dictation system

○

Development of technology for Dragon

NaturallySpeaking may be compared with the

general development

•

Phonetic matching and word

verification are unified with word

sequence generation that depends on

the highest overall rating typically

using a context-dependent phonetic

acoustic model

Ex. Explicit segmentation and labeling of

phonetic strings is no longer necessary

○

Establishment of the statistical machine-learning

framework, supported by the availability of

computing infrastructure and massive training

data, constitutes the most significant driving force

in advancing the development of speech

recognition

•

In non-probablistic models, there is an

estimated "distance" between sound labels

based on how similar to sounds are

estimated to be

○

In probability models, an estimate is used of

the conditional probability of observing a

particular sound label as the best matching

label, conditional on the correct label being

the hypothesized label (=confusion

probability)

○

Early methods aimed to find the closest matching

sound label from a discrete set of labels

•

Model process has a learning algorithm with

a broadly applicable convergence theorem =

Expectation-Maximization (EM) algorithm

○

Hidden Markov Model (HMM) -process that is

hidden not the model

•

DNN can replace the Gaussian

mixture model directly to overcome

the inefficiency in data representation

Significant development offered learned

feature representation with introduction of

deep neural networks (DNN)

○

Combination of HMM and DNN produced

significant error reduction

○

Before 2010, mixture of HMM-based Gaussian

densities were typically used

•

Markov process -probabilities of future events

will be independent of any additional information

about the past history of the process

•

Corpora have been created, annotated and

distributed to the world-wide community by

National Institute of Standard and

Technology (NIST), Linguistic Data

Consortium (LDC), European Language

Resources Association (ELRA), and others

○

Character of the recorded speech has

progressed from limited, constrained speech

materials to huge amounts of progressively

more realistic, spontaneous speech

○

Training data and computational resources;

•

Made it possible for speech recognition to

consume the significantly improved

computational infrastructure

○

Moore's Law: doubling the amount of computation

for a given cost every 12-18 months, as well as

comparably shrinking cost of memory

•

Made it possible to create a far more

powerful language model for voice

search applications

Both Google and Big indexed the entire

Web

○

Cloud-based speech recognition made it even

more convenient to accumulate an even more

massive amount of speech

•

Magnitude as a function of frequency is

called the "spectrum" of the short window of

speech, and a sequence of such spectra over

time in a speech utterance can be visualized

as a spectrogram

○

Deep learning technology aims at

minimizing such information loss and

searching for more powerful, deep

learning-driven speech representations

from raw data

Modifications of spectrograms led to

significant improvements in the performance

of Gaussian mixture-based HMM systems

despite the loss of raw speech information

○

In 1976 acoustic features were typically a measure

of the magnitude at each set of frequencies for

each time window

•

Maximum size has increased significantly

○

Systems in 1990s tried to recognize every

word dictated and counted every words not

recognized as an error

○

It was important for the system to learn the

names and places that occurred repeatedly in

a particular user's dictation

○

Significant advances were made in statistic

learning techniques

○

Problem remains a challenge because

modeling new words is still far from

seamless

○

Vocabulary size:

•

There was still a significant gap in

performance between single-speaker,

speaker-dependent models and speaker-

independent models intended for a diverse

population

○

Key was to use more speech data from

a large number of speakers to train the

HMM-based system

Sphinx introduced a large vocabulary of

speaker-independent continuous speech

recognition

○

Adaptive learning is also applied to

accommodate speaker variations and a wide

range of variable conditions for the channel,

noise and domain

○

Effective adaptive technologies enable rapid

application integration and are a key to

successful commercial deployment of

speech recognition

○

Speaker independent and adaptive systems:

•

More important has been searchable unified

graph representations that allow multiple

sources of knowledge to be incorporated

into a common probablistic framework

○

Practical decoding algorithms made possible

large-scale continuous speech recognition

○

Multiple speech streams

Multiple probability estimators

Multiple recognition systems

Multiple pass systems with increased

constraints

Non-compositional methods include:

○

Decoding techniques:

•

User utters queries on flight

information in an unrestricted

free form

□

Ex. Air Travel Information System

(ATIS)

SLY mostly relied on case grammars for

representing sets of semantic concepts

during 1970s

○

Number of techniques are used to fill frame

slots of the application domain from the

training data

○

Like acoustic and language modeling, deep

learning based on recurrent neural networks

can also significantly improve filling slots

for language understanding

○

Spoken language understanding (SLU):

•

There is no data like more data

○

Computing infrastructure

○

Unsupervised learning

○

Portability and generalizability

○

Dealing with uncertainties

○

Having Socrates' wisdom

○

Six Major Challenges:

•

In 1976 computation power available was

only adequate to perform speech recognition

on highly constrained tasks with low

branching factors (perplexity)

○

Thousands of processors and nearly

unlimited collective memory capacity

in the cloud

Power of these systems arises mainly

from their ability to collect, process

and learn from very large data sets

In 1976, faster computer available for

routine speech was a dedicated PDP-10 with

4MB memory

○

Algorithmic improvements have been

made (ex. Using distributed

algorithms for deep learning task)

Still difficult to dynamically adapt to

the speaker and environment, which

have the potential to reduce the error

rate by half

Social graph used for Web search

engines can be used to dramatically

reduce the needed search space

Mixed lingual speech makes the new

world problem more difficult

Multimodal interactive

metaphor will be a dominant

metaphor as illustrated by

MiPad demo and Apple's Siri-

like services

□

We are still missing human-like

clarification dialog for new

words previously unknown to

the system

□

Associated problem of error detection

and correction lead to difficult user

interface choices

Some systems require the use of

more powerful discrimination

learning

□

Dynamic sparse data learning is

missing in most systems

□

Recognition of highly confusable

words is still a problem

Speech recognition will help

bridge the gap between us and

machines

□

A powerful tool to facilitate and

enhance natural conservation

among people regardless of

barriers of location or language

□

Speech recognition in the next 40

years will pass the Turing test

Basic learning an decoding algorithms have

not changes substantially in the past 40 years

○

Conclusion:

•

Read: Historical perspective of speech recognition

To animate means "to give life to"

•

In computer animation, animators use

software to draw, model and animate objects

and characters in vast digital landscapes

○

An animator's job is to take a static image or

object and literally bring it to life by giving it

movement and personality

•

The animator draws objects or

characters either by hand or with a

computer

Then he positions his creations in key

frames, which form an outline of the

most important movements

This process is called tweening

□

Next, the computer uses mathematical

algorithms to fill in the "in-between"

frames

Key framing and tweening are

traditional animation techniques that

can be done by hand, but are

accomplished much faster with a

computer

Computer-assisted animation is typically

2D, like cartoons

○

This cannot be done with a pencil and

paper

Key framing and tweening are still an

important function of computer-

generated animation, but there are

other techniques that don't relate to

traditional animation

Using mathematical algorithms,

animators can program objects to

adhere to (or break) physical laws like

gravity, mass and force

Or create tremendous herds and flocks

of creatures that appear to act

independently, yet collectively

With computer-generated animation,

instead of animating each hair on a

monster's head, the monster's fur is

designed to wave gently in the wind

and lay flat when wet

Computer-generated animation is 3D,

meaning that objects and characters are

modeled on a plane with a X, Y and Z axis

○

There are two basic kinds of computer animation:

computer-assisted and computer-generated

•

Animators at Disney revolutionized the

industry with innovations like the use of

sounds in animated short films and the

multi-plane camera stand that created the

parallax effect of background depth

○

Technology has been a long part of the animator's

toolkit

•

Earliest films were scientific simulations

including "Flow of a Viscous Fluid" and

"Propagation of Shock Waves in a Solid

Form"

○

The roots of computer animation began with

computer graphics pioneers in the early 1960s

•

Utah Teapot -rendered 3D teapot that

signaled a turning point in the

photorealistic quality of 3D graphics

University of Utah was source of the earliest

important break through in 3D computer

graphics, like the hidden surface algorithm

that allows a computer to conceptualize 3D

objects

○

Ed Catmull (University of Utah) was one of the

first to toy with computer animation as art,

beginning with a 3D rendering of his hand opening

and closing

•

More films in later 1970s and early 1980s

relied on computer graphics (CG) to create

primitive effects

○

"Tron" (1982) was ideal for showcasing

undeniably digital effects

○

"Jurassic Park" (1993) was first feature film

to integrate convincingly real, entirely

computer generated characters into a lice

action film

○

"Toy Story" (1995) from Pixar was first full-

length cartoon made entirely with computer-

generated 3D animation

○

In 1972, "Westworld" became first film to contain

computer-generated 2D graphics

•

Today, a standard desktop computer runs

5000x faster than those used by computer

graphics pioneers in 1960s

○

Cost of the basic technology for creating

computer animation has dropped from

$500,000 to less than $2,000

○

Increasing sophistication and realism of 3D

animation can be directly credited to an

exponential growth in computer processing power

•

Read: How computer animation works

create 3D world inside a computer

•

Can move a camera inside a 3D world to make

characters come to life

•

Moment in lighting where all the pieces

come together and the world comes to life

○

Can use light to help tell story, set mood,

guide the audience eye, make characters

stand out…etc

○

Can add or remove lighting to place it and create

shadows -balancing reality and artistry

•

Mimic physics of water, light,

movement…etc. but not constrained by it

○

Tether ourselves with science and the world we

know as a background -create something relatable

•

Ribbons of light

○

Movement of water

○

Fog beams

○

Can change colour to depict a mood

○

Create a believable world that the audience

can immerse themselves in

○

Ex. Finding Nemo

•

Ex. Light in Walle's "binoculars" to make it

more human-like with emotions

○

Can portray emotion with light

•

Interweaving of art and science

•

Watch: Magic ingredient that brings Pixar movies to life

Augmented reality is the welding of the real world

with computer-generated imagery

•

Latest technology

•

Willing enter fictional world

○

Must suspend our disbelief

•

Deliberately exploit the way the audience thinks

•

Defy logic

•

Watch: Magical tale (with augmented reality)

Explore solutions to solve the climate crisis

○

Could see pictures or videos on a map

Can include pictures, interactive images, and

videos that can be expanded

○

Ex. Blowing into mic to see windmill

mechanism

Interactive infographics -can explore

○

Runs on ipad and iphone

○

"Our Choice" -first interactive digital book by Al

Gore

•

Watch: Next-generation digital book

Augmented reality (AR) is an enhanced

perception of a physical, real-world

environment such that some elements are

augmented and overlaid by computer-

generated sensory input, such as sounds,

images, video or computer vision

○

The perception may be in real time and may

involve human interactivity

○

Applications may include the Google glass,

realistic medical/military training, and

realistic entertainment

○

What is augmented reality, and some of its

potential uses?

Speech recognition is a collection of

computational techniques that aim at

recognizing or classifying natural human

speech

○

Usually, it processes the speech signal as

data from its coding representation,

segments the code into components,

identifies the components in their context,

and classifies the components into

interpretable human words

○

Each step of the process can present some

difficulties due to possible ambiguity,

uncertainty of the true nature, or confusion

between potential words

○

What is speech recognition and briefly discuss

some of the challenges in its development?

Tweening -technique usually using

computing to generate frames to fill in the

content between two key frames in

animation

Key frame -frame used as a reference point

in computer algorithm for smooth transition

Kinematics -study of methods in

manipulating motion of an object, usually

using a 3D model without considering the

cause of the motion

Briefly define these terms in animation:3.

Questions:

Multimedia Computing

Friday,*March*9,*2018 2:24*PM

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Identify and describe some of the approaches used in digital book and computer animation. Examine and evaluate the design and presentation using multimedia techniques. Describe an interactive technique based on speech recognition, its nature, and how it was developed. Multimedia appeals to more of our senses than the traditional computing. It may even appear in daily products in the form of digital books and movies. It may use user interactivities such as speech recognition. Despite the attractiveness and impact of using multimedia, the key is still the presentation of the intended message using these media. Multimedia computing not only involves algorithms that connect and process different types of data, but also involves the creative use of technologies in delivering a story Computer animation is to take static objects and give them life through their movements and personality, and illusory devices that take advantage of a sequence of animated events.

CIS 2050 Lecture Notes - Lecture 8: Turing Test, Spoken Language, Traditional Animation

Document Summary

Get access

Related Documents

CIS 2050 Chapter Notes - Chapter 8: Darpa, Mixture Model, Beam Search

CISC 081 Study Guide - Final Guide: Key Frame, Speech Recognition, Inverse Kinematics