CIS 2050 Lecture Notes - Lecture 8: Turing Test, Spoken Language, Traditional Animation
Identify and describe some of the approaches used
in digital book and computer animation
1.
Examine and evaluate the design and presentation
using multimedia techniques
2.
Describe an interactive technique based on speech
recognition, its nature, and how it was developed
3.
Learning Outcomes:
It may even appear in daily products in the
form of digital books and movies
○
It may use user interactivities such as speech
recognition
○
Despite the attractiveness and impact of
using multimedia, the key is still the
presentation of the intended message using
these media
○
Multimedia appeals to more of our senses than the
traditional computing
•
Computer animation is to take static objects
and give them "life" through their
movements and personality, and illusory
devices that take advantage of a sequence of
animated events
○
Multimedia computing not only involves
algorithms that connect and process different types
of data, but also involves the creative use of
technologies in delivering a "story"
•
Computer animation may involve many
steps, using techniques from modeling to
specific rendering
○
Augmented reality combines the real world with
computer-generated virtual imagery of events
•
A speech sequence is often segmented and
its parts are then recognized into their
appropriate words using computer
algorithms
○
There are still many difficulties in
recognizing human speech using the
computer, especially when the speech
consists of spoken works from people of
different cultures
○
The research on speech recognition was motivated
by the coding technology of speech signals
•
Key Points:
Insights gained from the speech recognition
advances over the past 40 years are
explored, originated from generations of
Carnegie Mellon University's R&D
○
Several major achievements over the years
have proven to work well in practice for
leading industry speech recognition systems
from Apple to Microsoft
○
It will help bridge the gap between
humans and machines
!
It will facilitate and enhance natural
conservation among people
!
6 challenges need to be addressed
before we can realize with audacious
dream
!
Speech recognition will mass the Turing
Test and bring the vision of Star-Trek-like
mobile devices to reality
○
Key Insights:
•
In 1976, one of the authors (Reddy) wrote a
comprehensive review of the state of the art
of voice recognition at that time
○
With the introduction of Apple's Siri and similar
voice search services from Google and Microsoft,
it is natural to wonder why voice recognition
technology took so long to advance to this level
•
Reddy predicted it would be possible to a
build a $20,000 connected speech system
within the next 10 years in 1976
○
Although it took longer than projected, the
system costs were just less and continues to
drop dramatically
○
Speech recognition has been a staple in science
fiction for years, but in 1976 the real-world
capabilities bore little resemblance to the far-
fetched capabilities in the fictional realm
•
Although it was commercially
successful, the "speech in" and "screen
out" multimodal metaphor is more
natural for information consumption
!
In 1999 the VoiceXML forum was created
to support telephony IVR
○
Illustrated a vision on speech-enables
multimobile devices
!
In 2001, Bill Gates demonstrated such a
prototype codenamed MiPad at CES
○
The speech community is en route to
passing the Turing Test in the next 40
years with the ultimate goal to match
an exceed a human's speech
recognition capability for everyday
scenarios
We are now witnessing the ever-improved
ability of devices to handle relatively
unrestricted multimodal dialogues
○
1995, Microsoft SAPI was first shipped in
Windows 95 to enable application developers to
create speech applications on Windows
•
Statistical modeling and machine learning
○
Training data and computing resources
○
Vocabulary size and dis-fluent speech
○
Speaker independent and adaptive speech
recognition
○
Efficient decoder
○
Spoken language and understanding
dialogue
○
What we did not know how to do in 1976:
•
Acoustic
!
Parametric
!
Phonemic
!
Lexial
!
Sentence
!
Semantic
!
Six levels of knowledge:
○
In 1971, the speech recognition study group
recommended that many more sources of
knowledge be brought to bear on the problem
•
Understanding Research (SUR) project
○
Hearsay
!
Dragon
!
Harpy
!
Sphinx I/II
!
Developed sequence of speech recognition
system
○
Ex. Voice control of a robot, large-
vocabulary connected-speech
recognition, speaker-independent
speech recognition and unrestricted
vocabulary dictation
!
Created several historic demonstrations of
spoken language systems
○
Hearsay-I was one of the first systems
capable of continuous speech recognition
○
Dragon system was one of the first to model
speech as hidden stochastic process
○
Harpy system introduced concept of Beam
Search, which was the most widely used
technique for efficient searching and
matching for decades
○
*speech recognition word error
rate has been used as the main
metric to evaluate progress
□
Sphinx-II (1992) benefited largely
from tied parameters to balance
trainability and efficiency, which
achieved the highest recognition
accuracy in DARPA-funded speech
benchmark evaluation
!
Sphinx-I (1987) was first system to
demonstrate speaker-independent speech
recognition
○
The word error rate was approaching a new
milestone by both Microsoft and IBM
researchers following the deep learning
framework pioneered by researchers at the
University of Toronto and Microsoft
○
By 1976, Reddy was leading a group at Carnegie
Mellon University to explore ideas of Advanced
Research Project Agency (DARPA)-sponsored
Speech
•
Architecture of the Hearsay system was
designed so that many semiautonomous
modules can communicate and cooperate in
a speech recognition task while each
concentrated on its own area of expertise
○
The Dragon, Harpy and Sphinx systems all
were based on a single, relatively single
modeling principle of joint global
optimization
○
It was anticipated in the early 1970s that to bear
the higher-level sources of knowledge might
require significant breakthroughs in artificial
intelligence
•
Decoding process in a speech recognizer's
operation is to find a sequence of words
whose corresponding acoustic and language
models best match the input vector sequence
○
= search process
○
Graph search algorithms, which have been
explored extensively in the fields of artificial
intelligence, operations research and game
theory, serve as the basic foundation for the
search problem in speech recognition
○
Decoding process of finding the best matched
work sequence to match input speech is more than
just a simple pattern recognition problem, since
one faces a practically astronomical number of
word patterns to search
•
The most salient difference is not algorithms
with a lower error rate, but rather an
emphasis on simplified algorithms with a
better cost-performance trade-off
○
Long term foal was the development of a
real-time, large-vocabulary, continuous-
speech dictation system
○
Development of technology for Dragon
NaturallySpeaking may be compared with the
general development
•
Phonetic matching and word
verification are unified with word
sequence generation that depends on
the highest overall rating typically
using a context-dependent phonetic
acoustic model
!
Ex. Explicit segmentation and labeling of
phonetic strings is no longer necessary
○
Establishment of the statistical machine-learning
framework, supported by the availability of
computing infrastructure and massive training
data, constitutes the most significant driving force
in advancing the development of speech
recognition
•
In non-probablistic models, there is an
estimated "distance" between sound labels
based on how similar to sounds are
estimated to be
○
In probability models, an estimate is used of
the conditional probability of observing a
particular sound label as the best matching
label, conditional on the correct label being
the hypothesized label (=confusion
probability)
○
Early methods aimed to find the closest matching
sound label from a discrete set of labels
•
Model process has a learning algorithm with
a broadly applicable convergence theorem =
Expectation-Maximization (EM) algorithm
○
Hidden Markov Model (HMM) -process that is
hidden not the model
•
DNN can replace the Gaussian
mixture model directly to overcome
the inefficiency in data representation
!
Significant development offered learned
feature representation with introduction of
deep neural networks (DNN)
○
Combination of HMM and DNN produced
significant error reduction
○
Before 2010, mixture of HMM-based Gaussian
densities were typically used
•
Markov process -probabilities of future events
will be independent of any additional information
about the past history of the process
•
Corpora have been created, annotated and
distributed to the world-wide community by
National Institute of Standard and
Technology (NIST), Linguistic Data
Consortium (LDC), European Language
Resources Association (ELRA), and others
○
Character of the recorded speech has
progressed from limited, constrained speech
materials to huge amounts of progressively
more realistic, spontaneous speech
○
Training data and computational resources;
•
Made it possible for speech recognition to
consume the significantly improved
computational infrastructure
○
Moore's Law: doubling the amount of computation
for a given cost every 12-18 months, as well as
comparably shrinking cost of memory
•
Made it possible to create a far more
powerful language model for voice
search applications
!
Both Google and Big indexed the entire
Web
○
Cloud-based speech recognition made it even
more convenient to accumulate an even more
massive amount of speech
•
Magnitude as a function of frequency is
called the "spectrum" of the short window of
speech, and a sequence of such spectra over
time in a speech utterance can be visualized
as a spectrogram
○
Deep learning technology aims at
minimizing such information loss and
searching for more powerful, deep
learning-driven speech representations
from raw data
!
Modifications of spectrograms led to
significant improvements in the performance
of Gaussian mixture-based HMM systems
despite the loss of raw speech information
○
In 1976 acoustic features were typically a measure
of the magnitude at each set of frequencies for
each time window
•
Maximum size has increased significantly
○
Systems in 1990s tried to recognize every
word dictated and counted every words not
recognized as an error
○
It was important for the system to learn the
names and places that occurred repeatedly in
a particular user's dictation
○
Significant advances were made in statistic
learning techniques
○
Problem remains a challenge because
modeling new words is still far from
seamless
○
Vocabulary size:
•
There was still a significant gap in
performance between single-speaker,
speaker-dependent models and speaker-
independent models intended for a diverse
population
○
Key was to use more speech data from
a large number of speakers to train the
HMM-based system
!
Sphinx introduced a large vocabulary of
speaker-independent continuous speech
recognition
○
Adaptive learning is also applied to
accommodate speaker variations and a wide
range of variable conditions for the channel,
noise and domain
○
Effective adaptive technologies enable rapid
application integration and are a key to
successful commercial deployment of
speech recognition
○
Speaker independent and adaptive systems:
•
More important has been searchable unified
graph representations that allow multiple
sources of knowledge to be incorporated
into a common probablistic framework
○
Practical decoding algorithms made possible
large-scale continuous speech recognition
○
Multiple speech streams
!
Multiple probability estimators
!
Multiple recognition systems
!
Multiple pass systems with increased
constraints
!
Non-compositional methods include:
○
Decoding techniques:
•
User utters queries on flight
information in an unrestricted
free form
□
Ex. Air Travel Information System
(ATIS)
!
SLY mostly relied on case grammars for
representing sets of semantic concepts
during 1970s
○
Number of techniques are used to fill frame
slots of the application domain from the
training data
○
Like acoustic and language modeling, deep
learning based on recurrent neural networks
can also significantly improve filling slots
for language understanding
○
Spoken language understanding (SLU):
•
There is no data like more data
○
Computing infrastructure
○
Unsupervised learning
○
Portability and generalizability
○
Dealing with uncertainties
○
Having Socrates' wisdom
○
Six Major Challenges:
•
In 1976 computation power available was
only adequate to perform speech recognition
on highly constrained tasks with low
branching factors (perplexity)
○
Thousands of processors and nearly
unlimited collective memory capacity
in the cloud
!
Power of these systems arises mainly
from their ability to collect, process
and learn from very large data sets
!
In 1976, faster computer available for
routine speech was a dedicated PDP-10 with
4MB memory
○
Algorithmic improvements have been
made (ex. Using distributed
algorithms for deep learning task)
!
Still difficult to dynamically adapt to
the speaker and environment, which
have the potential to reduce the error
rate by half
!
Social graph used for Web search
engines can be used to dramatically
reduce the needed search space
!
Mixed lingual speech makes the new
world problem more difficult
!
Multimodal interactive
metaphor will be a dominant
metaphor as illustrated by
MiPad demo and Apple's Siri-
like services
□
We are still missing human-like
clarification dialog for new
words previously unknown to
the system
□
Associated problem of error detection
and correction lead to difficult user
interface choices
!
Some systems require the use of
more powerful discrimination
learning
□
Dynamic sparse data learning is
missing in most systems
□
Recognition of highly confusable
words is still a problem
!
Speech recognition will help
bridge the gap between us and
machines
□
A powerful tool to facilitate and
enhance natural conservation
among people regardless of
barriers of location or language
□
Speech recognition in the next 40
years will pass the Turing test
!
Basic learning an decoding algorithms have
not changes substantially in the past 40 years
○
Conclusion:
•
Read: Historical perspective of speech recognition
To animate means "to give life to"
•
In computer animation, animators use
software to draw, model and animate objects
and characters in vast digital landscapes
○
An animator's job is to take a static image or
object and literally bring it to life by giving it
movement and personality
•
The animator draws objects or
characters either by hand or with a
computer
!
Then he positions his creations in key
frames, which form an outline of the
most important movements
!
This process is called tweening
□
Next, the computer uses mathematical
algorithms to fill in the "in-between"
frames
!
Key framing and tweening are
traditional animation techniques that
can be done by hand, but are
accomplished much faster with a
computer
!
Computer-assisted animation is typically
2D, like cartoons
○
This cannot be done with a pencil and
paper
!
Key framing and tweening are still an
important function of computer-
generated animation, but there are
other techniques that don't relate to
traditional animation
!
Using mathematical algorithms,
animators can program objects to
adhere to (or break) physical laws like
gravity, mass and force
!
Or create tremendous herds and flocks
of creatures that appear to act
independently, yet collectively
!
With computer-generated animation,
instead of animating each hair on a
monster's head, the monster's fur is
designed to wave gently in the wind
and lay flat when wet
!
Computer-generated animation is 3D,
meaning that objects and characters are
modeled on a plane with a X, Y and Z axis
○
There are two basic kinds of computer animation:
computer-assisted and computer-generated
•
Animators at Disney revolutionized the
industry with innovations like the use of
sounds in animated short films and the
multi-plane camera stand that created the
parallax effect of background depth
○
Technology has been a long part of the animator's
toolkit
•
Earliest films were scientific simulations
including "Flow of a Viscous Fluid" and
"Propagation of Shock Waves in a Solid
Form"
○
The roots of computer animation began with
computer graphics pioneers in the early 1960s
•
Utah Teapot -rendered 3D teapot that
signaled a turning point in the
photorealistic quality of 3D graphics
!
University of Utah was source of the earliest
important break through in 3D computer
graphics, like the hidden surface algorithm
that allows a computer to conceptualize 3D
objects
○
Ed Catmull (University of Utah) was one of the
first to toy with computer animation as art,
beginning with a 3D rendering of his hand opening
and closing
•
More films in later 1970s and early 1980s
relied on computer graphics (CG) to create
primitive effects
○
"Tron" (1982) was ideal for showcasing
undeniably digital effects
○
"Jurassic Park" (1993) was first feature film
to integrate convincingly real, entirely
computer generated characters into a lice
action film
○
"Toy Story" (1995) from Pixar was first full-
length cartoon made entirely with computer-
generated 3D animation
○
In 1972, "Westworld" became first film to contain
computer-generated 2D graphics
•
Today, a standard desktop computer runs
5000x faster than those used by computer
graphics pioneers in 1960s
○
Cost of the basic technology for creating
computer animation has dropped from
$500,000 to less than $2,000
○
Increasing sophistication and realism of 3D
animation can be directly credited to an
exponential growth in computer processing power
•
Read: How computer animation works
create 3D world inside a computer
•
Can move a camera inside a 3D world to make
characters come to life
•
Moment in lighting where all the pieces
come together and the world comes to life
○
Can use light to help tell story, set mood,
guide the audience eye, make characters
stand out…etc
○
Can add or remove lighting to place it and create
shadows -balancing reality and artistry
•
Mimic physics of water, light,
movement…etc. but not constrained by it
○
Tether ourselves with science and the world we
know as a background -create something relatable
•
Ribbons of light
○
Movement of water
○
Fog beams
○
Can change colour to depict a mood
○
Create a believable world that the audience
can immerse themselves in
○
Ex. Finding Nemo
•
Ex. Light in Walle's "binoculars" to make it
more human-like with emotions
○
Can portray emotion with light
•
Interweaving of art and science
•
Watch: Magic ingredient that brings Pixar movies to life
Augmented reality is the welding of the real world
with computer-generated imagery
•
Latest technology
•
Willing enter fictional world
○
Must suspend our disbelief
•
Deliberately exploit the way the audience thinks
•
Defy logic
•
Watch: Magical tale (with augmented reality)
Explore solutions to solve the climate crisis
○
Could see pictures or videos on a map
!
Can include pictures, interactive images, and
videos that can be expanded
○
Ex. Blowing into mic to see windmill
mechanism
!
Interactive infographics -can explore
○
Runs on ipad and iphone
○
"Our Choice" -first interactive digital book by Al
Gore
•
Watch: Next-generation digital book
Augmented reality (AR) is an enhanced
perception of a physical, real-world
environment such that some elements are
augmented and overlaid by computer-
generated sensory input, such as sounds,
images, video or computer vision
○
The perception may be in real time and may
involve human interactivity
○
Applications may include the Google glass,
realistic medical/military training, and
realistic entertainment
○
What is augmented reality, and some of its
potential uses?
1.
Speech recognition is a collection of
computational techniques that aim at
recognizing or classifying natural human
speech
○
Usually, it processes the speech signal as
data from its coding representation,
segments the code into components,
identifies the components in their context,
and classifies the components into
interpretable human words
○
Each step of the process can present some
difficulties due to possible ambiguity,
uncertainty of the true nature, or confusion
between potential words
○
What is speech recognition and briefly discuss
some of the challenges in its development?
2.
Tweening -technique usually using
computing to generate frames to fill in the
content between two key frames in
animation
a.
Key frame -frame used as a reference point
in computer algorithm for smooth transition
b.
Kinematics -study of methods in
manipulating motion of an object, usually
using a 3D model without considering the
cause of the motion
c.
Briefly define these terms in animation:3.
Questions:
Multimedia Computing
Friday,*March*9,*2018
2:24*PM
Identify and describe some of the approaches used
in digital book and computer animation
1.
Examine and evaluate the design and presentation
using multimedia techniques
2.
Describe an interactive technique based on speech
recognition, its nature, and how it was developed
3.
Learning Outcomes:
It may even appear in daily products in the
form of digital books and movies
○
It may use user interactivities such as speech
recognition
○
Despite the attractiveness and impact of
using multimedia, the key is still the
presentation of the intended message using
these media
○
Multimedia appeals to more of our senses than the
traditional computing
•
Computer animation is to take static objects
and give them "life" through their
movements and personality, and illusory
devices that take advantage of a sequence of
animated events
○
Multimedia computing not only involves
algorithms that connect and process different types
of data, but also involves the creative use of
technologies in delivering a "story"
•
Computer animation may involve many
steps, using techniques from modeling to
specific rendering
○
Augmented reality combines the real world with
computer-generated virtual imagery of events
•
A speech sequence is often segmented and
its parts are then recognized into their
appropriate words using computer
algorithms
○
There are still many difficulties in
recognizing human speech using the
computer, especially when the speech
consists of spoken works from people of
different cultures
○
The research on speech recognition was motivated
by the coding technology of speech signals
•
Key Points:
Insights gained from the speech recognition
advances over the past 40 years are
explored, originated from generations of
Carnegie Mellon University's R&D
○
Several major achievements over the years
have proven to work well in practice for
leading industry speech recognition systems
from Apple to Microsoft
○
It will help bridge the gap between
humans and machines
!
It will facilitate and enhance natural
conservation among people
!
6 challenges need to be addressed
before we can realize with audacious
dream
!
Speech recognition will mass the Turing
Test and bring the vision of Star-Trek-like
mobile devices to reality
○
Key Insights:
•
In 1976, one of the authors (Reddy) wrote a
comprehensive review of the state of the art
of voice recognition at that time
○
With the introduction of Apple's Siri and similar
voice search services from Google and Microsoft,
it is natural to wonder why voice recognition
technology took so long to advance to this level
•
Reddy predicted it would be possible to a
build a $20,000 connected speech system
within the next 10 years in 1976
○
Although it took longer than projected, the
system costs were just less and continues to
drop dramatically
○
Speech recognition has been a staple in science
fiction for years, but in 1976 the real-world
capabilities bore little resemblance to the far-
fetched capabilities in the fictional realm
•
Although it was commercially
successful, the "speech in" and "screen
out" multimodal metaphor is more
natural for information consumption
!
In 1999 the VoiceXML forum was created
to support telephony IVR
○
Illustrated a vision on speech-enables
multimobile devices
!
In 2001, Bill Gates demonstrated such a
prototype codenamed MiPad at CES
○
The speech community is en route to
passing the Turing Test in the next 40
years with the ultimate goal to match
an exceed a human's speech
recognition capability for everyday
scenarios
We are now witnessing the ever-improved
ability of devices to handle relatively
unrestricted multimodal dialogues
○
1995, Microsoft SAPI was first shipped in
Windows 95 to enable application developers to
create speech applications on Windows
•
Statistical modeling and machine learning
○
Training data and computing resources
○
Vocabulary size and dis-fluent speech
○
Speaker independent and adaptive speech
recognition
○
Efficient decoder
○
Spoken language and understanding
dialogue
○
What we did not know how to do in 1976:
•
Acoustic
!
Parametric
!
Phonemic
!
Lexial
!
Sentence
!
Semantic
!
Six levels of knowledge:
○
In 1971, the speech recognition study group
recommended that many more sources of
knowledge be brought to bear on the problem
•
Understanding Research (SUR) project
○
Hearsay
!
Dragon
!
Harpy
!
Sphinx I/II
!
Developed sequence of speech recognition
system
○
Ex. Voice control of a robot, large-
vocabulary connected-speech
recognition, speaker-independent
speech recognition and unrestricted
vocabulary dictation
!
Created several historic demonstrations of
spoken language systems
○
Hearsay-I was one of the first systems
capable of continuous speech recognition
○
Dragon system was one of the first to model
speech as hidden stochastic process
○
Harpy system introduced concept of Beam
Search, which was the most widely used
technique for efficient searching and
matching for decades
○
*speech recognition word error
rate has been used as the main
metric to evaluate progress
□
Sphinx-II (1992) benefited largely
from tied parameters to balance
trainability and efficiency, which
achieved the highest recognition
accuracy in DARPA-funded speech
benchmark evaluation
!
Sphinx-I (1987) was first system to
demonstrate speaker-independent speech
recognition
○
The word error rate was approaching a new
milestone by both Microsoft and IBM
researchers following the deep learning
framework pioneered by researchers at the
University of Toronto and Microsoft
○
By 1976, Reddy was leading a group at Carnegie
Mellon University to explore ideas of Advanced
Research Project Agency (DARPA)-sponsored
Speech
•
Architecture of the Hearsay system was
designed so that many semiautonomous
modules can communicate and cooperate in
a speech recognition task while each
concentrated on its own area of expertise
○
The Dragon, Harpy and Sphinx systems all
were based on a single, relatively single
modeling principle of joint global
optimization
○
It was anticipated in the early 1970s that to bear
the higher-level sources of knowledge might
require significant breakthroughs in artificial
intelligence
•
Decoding process in a speech recognizer's
operation is to find a sequence of words
whose corresponding acoustic and language
models best match the input vector sequence
○
= search process
○
Graph search algorithms, which have been
explored extensively in the fields of artificial
intelligence, operations research and game
theory, serve as the basic foundation for the
search problem in speech recognition
○
Decoding process of finding the best matched
work sequence to match input speech is more than
just a simple pattern recognition problem, since
one faces a practically astronomical number of
word patterns to search
•
The most salient difference is not algorithms
with a lower error rate, but rather an
emphasis on simplified algorithms with a
better cost-performance trade-off
○
Long term foal was the development of a
real-time, large-vocabulary, continuous-
speech dictation system
○
Development of technology for Dragon
NaturallySpeaking may be compared with the
general development
•
Phonetic matching and word
verification are unified with word
sequence generation that depends on
the highest overall rating typically
using a context-dependent phonetic
acoustic model
!
Ex. Explicit segmentation and labeling of
phonetic strings is no longer necessary
○
Establishment of the statistical machine-learning
framework, supported by the availability of
computing infrastructure and massive training
data, constitutes the most significant driving force
in advancing the development of speech
recognition
•
In non-probablistic models, there is an
estimated "distance" between sound labels
based on how similar to sounds are
estimated to be
○
In probability models, an estimate is used of
the conditional probability of observing a
particular sound label as the best matching
label, conditional on the correct label being
the hypothesized label (=confusion
probability)
○
Early methods aimed to find the closest matching
sound label from a discrete set of labels
•
Model process has a learning algorithm with
a broadly applicable convergence theorem =
Expectation-Maximization (EM) algorithm
○
Hidden Markov Model (HMM) -process that is
hidden not the model
•
DNN can replace the Gaussian
mixture model directly to overcome
the inefficiency in data representation
!
Significant development offered learned
feature representation with introduction of
deep neural networks (DNN)
○
Combination of HMM and DNN produced
significant error reduction
○
Before 2010, mixture of HMM-based Gaussian
densities were typically used
•
Markov process -probabilities of future events
will be independent of any additional information
about the past history of the process
•
Corpora have been created, annotated and
distributed to the world-wide community by
National Institute of Standard and
Technology (NIST), Linguistic Data
Consortium (LDC), European Language
Resources Association (ELRA), and others
○
Character of the recorded speech has
progressed from limited, constrained speech
materials to huge amounts of progressively
more realistic, spontaneous speech
○
Training data and computational resources;
•
Made it possible for speech recognition to
consume the significantly improved
computational infrastructure
○
Moore's Law: doubling the amount of computation
for a given cost every 12-18 months, as well as
comparably shrinking cost of memory
•
Made it possible to create a far more
powerful language model for voice
search applications
!
Both Google and Big indexed the entire
Web
○
Cloud-based speech recognition made it even
more convenient to accumulate an even more
massive amount of speech
•
Magnitude as a function of frequency is
called the "spectrum" of the short window of
speech, and a sequence of such spectra over
time in a speech utterance can be visualized
as a spectrogram
○
Deep learning technology aims at
minimizing such information loss and
searching for more powerful, deep
learning-driven speech representations
from raw data
!
Modifications of spectrograms led to
significant improvements in the performance
of Gaussian mixture-based HMM systems
despite the loss of raw speech information
○
In 1976 acoustic features were typically a measure
of the magnitude at each set of frequencies for
each time window
•
Maximum size has increased significantly
○
Systems in 1990s tried to recognize every
word dictated and counted every words not
recognized as an error
○
It was important for the system to learn the
names and places that occurred repeatedly in
a particular user's dictation
○
Significant advances were made in statistic
learning techniques
○
Problem remains a challenge because
modeling new words is still far from
seamless
○
Vocabulary size:
•
There was still a significant gap in
performance between single-speaker,
speaker-dependent models and speaker-
independent models intended for a diverse
population
○
Key was to use more speech data from
a large number of speakers to train the
HMM-based system
!
Sphinx introduced a large vocabulary of
speaker-independent continuous speech
recognition
○
Adaptive learning is also applied to
accommodate speaker variations and a wide
range of variable conditions for the channel,
noise and domain
○
Effective adaptive technologies enable rapid
application integration and are a key to
successful commercial deployment of
speech recognition
○
Speaker independent and adaptive systems:
•
More important has been searchable unified
graph representations that allow multiple
sources of knowledge to be incorporated
into a common probablistic framework
○
Practical decoding algorithms made possible
large-scale continuous speech recognition
○
Multiple speech streams
!
Multiple probability estimators
!
Multiple recognition systems
!
Multiple pass systems with increased
constraints
!
Non-compositional methods include:
○
Decoding techniques:
•
User utters queries on flight
information in an unrestricted
free form
□
Ex. Air Travel Information System
(ATIS)
!
SLY mostly relied on case grammars for
representing sets of semantic concepts
during 1970s
○
Number of techniques are used to fill frame
slots of the application domain from the
training data
○
Like acoustic and language modeling, deep
learning based on recurrent neural networks
can also significantly improve filling slots
for language understanding
○
Spoken language understanding (SLU):
•
There is no data like more data
○
Computing infrastructure
○
Unsupervised learning
○
Portability and generalizability
○
Dealing with uncertainties
○
Having Socrates' wisdom
○
Six Major Challenges:
•
In 1976 computation power available was
only adequate to perform speech recognition
on highly constrained tasks with low
branching factors (perplexity)
○
Thousands of processors and nearly
unlimited collective memory capacity
in the cloud
!
Power of these systems arises mainly
from their ability to collect, process
and learn from very large data sets
!
In 1976, faster computer available for
routine speech was a dedicated PDP-10 with
4MB memory
○
Algorithmic improvements have been
made (ex. Using distributed
algorithms for deep learning task)
!
Still difficult to dynamically adapt to
the speaker and environment, which
have the potential to reduce the error
rate by half
!
Social graph used for Web search
engines can be used to dramatically
reduce the needed search space
!
Mixed lingual speech makes the new
world problem more difficult
!
Multimodal interactive
metaphor will be a dominant
metaphor as illustrated by
MiPad demo and Apple's Siri-
like services
□
We are still missing human-like
clarification dialog for new
words previously unknown to
the system
□
Associated problem of error detection
and correction lead to difficult user
interface choices
!
Some systems require the use of
more powerful discrimination
learning
□
Dynamic sparse data learning is
missing in most systems
□
Recognition of highly confusable
words is still a problem
!
Speech recognition will help
bridge the gap between us and
machines
□
A powerful tool to facilitate and
enhance natural conservation
among people regardless of
barriers of location or language
□
Speech recognition in the next 40
years will pass the Turing test
!
Basic learning an decoding algorithms have
not changes substantially in the past 40 years
○
Conclusion:
•
Read: Historical perspective of speech recognition
To animate means "to give life to"
•
In computer animation, animators use
software to draw, model and animate objects
and characters in vast digital landscapes
○
An animator's job is to take a static image or
object and literally bring it to life by giving it
movement and personality
•
The animator draws objects or
characters either by hand or with a
computer
!
Then he positions his creations in key
frames, which form an outline of the
most important movements
!
This process is called tweening
□
Next, the computer uses mathematical
algorithms to fill in the "in-between"
frames
!
Key framing and tweening are
traditional animation techniques that
can be done by hand, but are
accomplished much faster with a
computer
!
Computer-assisted animation is typically
2D, like cartoons
○
This cannot be done with a pencil and
paper
!
Key framing and tweening are still an
important function of computer-
generated animation, but there are
other techniques that don't relate to
traditional animation
!
Using mathematical algorithms,
animators can program objects to
adhere to (or break) physical laws like
gravity, mass and force
!
Or create tremendous herds and flocks
of creatures that appear to act
independently, yet collectively
!
With computer-generated animation,
instead of animating each hair on a
monster's head, the monster's fur is
designed to wave gently in the wind
and lay flat when wet
!
Computer-generated animation is 3D,
meaning that objects and characters are
modeled on a plane with a X, Y and Z axis
○
There are two basic kinds of computer animation:
computer-assisted and computer-generated
•
Animators at Disney revolutionized the
industry with innovations like the use of
sounds in animated short films and the
multi-plane camera stand that created the
parallax effect of background depth
○
Technology has been a long part of the animator's
toolkit
•
Earliest films were scientific simulations
including "Flow of a Viscous Fluid" and
"Propagation of Shock Waves in a Solid
Form"
○
The roots of computer animation began with
computer graphics pioneers in the early 1960s
•
Utah Teapot -rendered 3D teapot that
signaled a turning point in the
photorealistic quality of 3D graphics
!
University of Utah was source of the earliest
important break through in 3D computer
graphics, like the hidden surface algorithm
that allows a computer to conceptualize 3D
objects
○
Ed Catmull (University of Utah) was one of the
first to toy with computer animation as art,
beginning with a 3D rendering of his hand opening
and closing
•
More films in later 1970s and early 1980s
relied on computer graphics (CG) to create
primitive effects
○
"Tron" (1982) was ideal for showcasing
undeniably digital effects
○
"Jurassic Park" (1993) was first feature film
to integrate convincingly real, entirely
computer generated characters into a lice
action film
○
"Toy Story" (1995) from Pixar was first full-
length cartoon made entirely with computer-
generated 3D animation
○
In 1972, "Westworld" became first film to contain
computer-generated 2D graphics
•
Today, a standard desktop computer runs
5000x faster than those used by computer
graphics pioneers in 1960s
○
Cost of the basic technology for creating
computer animation has dropped from
$500,000 to less than $2,000
○
Increasing sophistication and realism of 3D
animation can be directly credited to an
exponential growth in computer processing power
•
Read: How computer animation works
create 3D world inside a computer
•
Can move a camera inside a 3D world to make
characters come to life
•
Moment in lighting where all the pieces
come together and the world comes to life
○
Can use light to help tell story, set mood,
guide the audience eye, make characters
stand out…etc
○
Can add or remove lighting to place it and create
shadows -balancing reality and artistry
•
Mimic physics of water, light,
movement…etc. but not constrained by it
○
Tether ourselves with science and the world we
know as a background -create something relatable
•
Ribbons of light
○
Movement of water
○
Fog beams
○
Can change colour to depict a mood
○
Create a believable world that the audience
can immerse themselves in
○
Ex. Finding Nemo
•
Ex. Light in Walle's "binoculars" to make it
more human-like with emotions
○
Can portray emotion with light
•
Interweaving of art and science
•
Watch: Magic ingredient that brings Pixar movies to life
Augmented reality is the welding of the real world
with computer-generated imagery
•
Latest technology
•
Willing enter fictional world
○
Must suspend our disbelief
•
Deliberately exploit the way the audience thinks
•
Defy logic
•
Watch: Magical tale (with augmented reality)
Explore solutions to solve the climate crisis
○
Could see pictures or videos on a map
!
Can include pictures, interactive images, and
videos that can be expanded
○
Ex. Blowing into mic to see windmill
mechanism
!
Interactive infographics -can explore
○
Runs on ipad and iphone
○
"Our Choice" -first interactive digital book by Al
Gore
•
Watch: Next-generation digital book
Augmented reality (AR) is an enhanced
perception of a physical, real-world
environment such that some elements are
augmented and overlaid by computer-
generated sensory input, such as sounds,
images, video or computer vision
○
The perception may be in real time and may
involve human interactivity
○
Applications may include the Google glass,
realistic medical/military training, and
realistic entertainment
○
What is augmented reality, and some of its
potential uses?
1.
Speech recognition is a collection of
computational techniques that aim at
recognizing or classifying natural human
speech
○
Usually, it processes the speech signal as
data from its coding representation,
segments the code into components,
identifies the components in their context,
and classifies the components into
interpretable human words
○
Each step of the process can present some
difficulties due to possible ambiguity,
uncertainty of the true nature, or confusion
between potential words
○
What is speech recognition and briefly discuss
some of the challenges in its development?
2.
Tweening -technique usually using
computing to generate frames to fill in the
content between two key frames in
animation
a.
Key frame -frame used as a reference point
in computer algorithm for smooth transition
b.
Kinematics -study of methods in
manipulating motion of an object, usually
using a 3D model without considering the
cause of the motion
c.
Briefly define these terms in animation:3.
Questions:
Multimedia Computing
Friday,*March*9,*2018 2:24*PM
Identify and describe some of the approaches used
in digital book and computer animation
1.
Examine and evaluate the design and presentation
using multimedia techniques
2.
Describe an interactive technique based on speech
recognition, its nature, and how it was developed
3.
Learning Outcomes:
It may even appear in daily products in the
form of digital books and movies
○
It may use user interactivities such as speech
recognition
○
Despite the attractiveness and impact of
using multimedia, the key is still the
presentation of the intended message using
these media
○
Multimedia appeals to more of our senses than the
traditional computing
•
Computer animation is to take static objects
and give them "life" through their
movements and personality, and illusory
devices that take advantage of a sequence of
animated events
○
Multimedia computing not only involves
algorithms that connect and process different types
of data, but also involves the creative use of
technologies in delivering a "story"
•
Computer animation may involve many
steps, using techniques from modeling to
specific rendering
○
Augmented reality combines the real world with
computer-generated virtual imagery of events
•
A speech sequence is often segmented and
its parts are then recognized into their
appropriate words using computer
algorithms
○
There are still many difficulties in
recognizing human speech using the
computer, especially when the speech
consists of spoken works from people of
different cultures
○
The research on speech recognition was motivated
by the coding technology of speech signals
•
Key Points:
Insights gained from the speech recognition
advances over the past 40 years are
explored, originated from generations of
Carnegie Mellon University's R&D
○
Several major achievements over the years
have proven to work well in practice for
leading industry speech recognition systems
from Apple to Microsoft
○
It will help bridge the gap between
humans and machines
!
It will facilitate and enhance natural
conservation among people
!
6 challenges need to be addressed
before we can realize with audacious
dream
!
Speech recognition will mass the Turing
Test and bring the vision of Star-Trek-like
mobile devices to reality
○
Key Insights:
•
In 1976, one of the authors (Reddy) wrote a
comprehensive review of the state of the art
of voice recognition at that time
○
With the introduction of Apple's Siri and similar
voice search services from Google and Microsoft,
it is natural to wonder why voice recognition
technology took so long to advance to this level
•
Reddy predicted it would be possible to a
build a $20,000 connected speech system
within the next 10 years in 1976
○
Although it took longer than projected, the
system costs were just less and continues to
drop dramatically
○
Speech recognition has been a staple in science
fiction for years, but in 1976 the real-world
capabilities bore little resemblance to the far-
fetched capabilities in the fictional realm
•
Although it was commercially
successful, the "speech in" and "screen
out" multimodal metaphor is more
natural for information consumption
!
In 1999 the VoiceXML forum was created
to support telephony IVR
○
Illustrated a vision on speech-enables
multimobile devices
!
In 2001, Bill Gates demonstrated such a
prototype codenamed MiPad at CES
○
The speech community is en route to
passing the Turing Test in the next 40
years with the ultimate goal to match
an exceed a human's speech
recognition capability for everyday
scenarios
We are now witnessing the ever-improved
ability of devices to handle relatively
unrestricted multimodal dialogues
○
1995, Microsoft SAPI was first shipped in
Windows 95 to enable application developers to
create speech applications on Windows
•
Statistical modeling and machine learning
○
Training data and computing resources
○
Vocabulary size and dis-fluent speech
○
Speaker independent and adaptive speech
recognition
○
Efficient decoder
○
Spoken language and understanding
dialogue
○
What we did not know how to do in 1976:
•
Acoustic
!
Parametric
!
Phonemic
!
Lexial
!
Sentence
!
Semantic
!
Six levels of knowledge:
○
In 1971, the speech recognition study group
recommended that many more sources of
knowledge be brought to bear on the problem
•
Understanding Research (SUR) project
○
Hearsay
!
Dragon
!
Harpy
!
Sphinx I/II
!
Developed sequence of speech recognition
system
○
Ex. Voice control of a robot, large-
vocabulary connected-speech
recognition, speaker-independent
speech recognition and unrestricted
vocabulary dictation
!
Created several historic demonstrations of
spoken language systems
○
Hearsay-I was one of the first systems
capable of continuous speech recognition
○
Dragon system was one of the first to model
speech as hidden stochastic process
○
Harpy system introduced concept of Beam
Search, which was the most widely used
technique for efficient searching and
matching for decades
○
*speech recognition word error
rate has been used as the main
metric to evaluate progress
□
Sphinx-II (1992) benefited largely
from tied parameters to balance
trainability and efficiency, which
achieved the highest recognition
accuracy in DARPA-funded speech
benchmark evaluation
!
Sphinx-I (1987) was first system to
demonstrate speaker-independent speech
recognition
○
The word error rate was approaching a new
milestone by both Microsoft and IBM
researchers following the deep learning
framework pioneered by researchers at the
University of Toronto and Microsoft
○
By 1976, Reddy was leading a group at Carnegie
Mellon University to explore ideas of Advanced
Research Project Agency (DARPA)-sponsored
Speech
•
Architecture of the Hearsay system was
designed so that many semiautonomous
modules can communicate and cooperate in
a speech recognition task while each
concentrated on its own area of expertise
○
The Dragon, Harpy and Sphinx systems all
were based on a single, relatively single
modeling principle of joint global
optimization
○
It was anticipated in the early 1970s that to bear
the higher-level sources of knowledge might
require significant breakthroughs in artificial
intelligence
•
Decoding process in a speech recognizer's
operation is to find a sequence of words
whose corresponding acoustic and language
models best match the input vector sequence
○
= search process
○
Graph search algorithms, which have been
explored extensively in the fields of artificial
intelligence, operations research and game
theory, serve as the basic foundation for the
search problem in speech recognition
○
Decoding process of finding the best matched
work sequence to match input speech is more than
just a simple pattern recognition problem, since
one faces a practically astronomical number of
word patterns to search
•
The most salient difference is not algorithms
with a lower error rate, but rather an
emphasis on simplified algorithms with a
better cost-performance trade-off
○
Long term foal was the development of a
real-time, large-vocabulary, continuous-
speech dictation system
○
Development of technology for Dragon
NaturallySpeaking may be compared with the
general development
•
Phonetic matching and word
verification are unified with word
sequence generation that depends on
the highest overall rating typically
using a context-dependent phonetic
acoustic model
!
Ex. Explicit segmentation and labeling of
phonetic strings is no longer necessary
○
Establishment of the statistical machine-learning
framework, supported by the availability of
computing infrastructure and massive training
data, constitutes the most significant driving force
in advancing the development of speech
recognition
•
In non-probablistic models, there is an
estimated "distance" between sound labels
based on how similar to sounds are
estimated to be
○
In probability models, an estimate is used of
the conditional probability of observing a
particular sound label as the best matching
label, conditional on the correct label being
the hypothesized label (=confusion
probability)
○
Early methods aimed to find the closest matching
sound label from a discrete set of labels
•
Model process has a learning algorithm with
a broadly applicable convergence theorem =
Expectation-Maximization (EM) algorithm
○
Hidden Markov Model (HMM) -process that is
hidden not the model
•
DNN can replace the Gaussian
mixture model directly to overcome
the inefficiency in data representation
!
Significant development offered learned
feature representation with introduction of
deep neural networks (DNN)
○
Combination of HMM and DNN produced
significant error reduction
○
Before 2010, mixture of HMM-based Gaussian
densities were typically used
•
Markov process -probabilities of future events
will be independent of any additional information
about the past history of the process
•
Corpora have been created, annotated and
distributed to the world-wide community by
National Institute of Standard and
Technology (NIST), Linguistic Data
Consortium (LDC), European Language
Resources Association (ELRA), and others
○
Character of the recorded speech has
progressed from limited, constrained speech
materials to huge amounts of progressively
more realistic, spontaneous speech
○
Training data and computational resources;
•
Made it possible for speech recognition to
consume the significantly improved
computational infrastructure
○
Moore's Law: doubling the amount of computation
for a given cost every 12-18 months, as well as
comparably shrinking cost of memory
•
Made it possible to create a far more
powerful language model for voice
search applications
!
Both Google and Big indexed the entire
Web
○
Cloud-based speech recognition made it even
more convenient to accumulate an even more
massive amount of speech
•
Magnitude as a function of frequency is
called the "spectrum" of the short window of
speech, and a sequence of such spectra over
time in a speech utterance can be visualized
as a spectrogram
○
Deep learning technology aims at
minimizing such information loss and
searching for more powerful, deep
learning-driven speech representations
from raw data
!
Modifications of spectrograms led to
significant improvements in the performance
of Gaussian mixture-based HMM systems
despite the loss of raw speech information
○
In 1976 acoustic features were typically a measure
of the magnitude at each set of frequencies for
each time window
•
Maximum size has increased significantly
○
Systems in 1990s tried to recognize every
word dictated and counted every words not
recognized as an error
○
It was important for the system to learn the
names and places that occurred repeatedly in
a particular user's dictation
○
Significant advances were made in statistic
learning techniques
○
Problem remains a challenge because
modeling new words is still far from
seamless
○
Vocabulary size:
•
There was still a significant gap in
performance between single-speaker,
speaker-dependent models and speaker-
independent models intended for a diverse
population
○
Key was to use more speech data from
a large number of speakers to train the
HMM-based system
!
Sphinx introduced a large vocabulary of
speaker-independent continuous speech
recognition
○
Adaptive learning is also applied to
accommodate speaker variations and a wide
range of variable conditions for the channel,
noise and domain
○
Effective adaptive technologies enable rapid
application integration and are a key to
successful commercial deployment of
speech recognition
○
Speaker independent and adaptive systems:
•
More important has been searchable unified
graph representations that allow multiple
sources of knowledge to be incorporated
into a common probablistic framework
○
Practical decoding algorithms made possible
large-scale continuous speech recognition
○
Multiple speech streams
!
Multiple probability estimators
!
Multiple recognition systems
!
Multiple pass systems with increased
constraints
!
Non-compositional methods include:
○
Decoding techniques:
•
User utters queries on flight
information in an unrestricted
free form
□
Ex. Air Travel Information System
(ATIS)
!
SLY mostly relied on case grammars for
representing sets of semantic concepts
during 1970s
○
Number of techniques are used to fill frame
slots of the application domain from the
training data
○
Like acoustic and language modeling, deep
learning based on recurrent neural networks
can also significantly improve filling slots
for language understanding
○
Spoken language understanding (SLU):
•
There is no data like more data
○
Computing infrastructure
○
Unsupervised learning
○
Portability and generalizability
○
Dealing with uncertainties
○
Having Socrates' wisdom
○
Six Major Challenges:
•
In 1976 computation power available was
only adequate to perform speech recognition
on highly constrained tasks with low
branching factors (perplexity)
○
Thousands of processors and nearly
unlimited collective memory capacity
in the cloud
!
Power of these systems arises mainly
from their ability to collect, process
and learn from very large data sets
!
In 1976, faster computer available for
routine speech was a dedicated PDP-10 with
4MB memory
○
Algorithmic improvements have been
made (ex. Using distributed
algorithms for deep learning task)
!
Still difficult to dynamically adapt to
the speaker and environment, which
have the potential to reduce the error
rate by half
!
Social graph used for Web search
engines can be used to dramatically
reduce the needed search space
!
Mixed lingual speech makes the new
world problem more difficult
!
Multimodal interactive
metaphor will be a dominant
metaphor as illustrated by
MiPad demo and Apple's Siri-
like services
□
We are still missing human-like
clarification dialog for new
words previously unknown to
the system
□
Associated problem of error detection
and correction lead to difficult user
interface choices
!
Some systems require the use of
more powerful discrimination
learning
□
Dynamic sparse data learning is
missing in most systems
□
Recognition of highly confusable
words is still a problem
!
Speech recognition will help
bridge the gap between us and
machines
□
A powerful tool to facilitate and
enhance natural conservation
among people regardless of
barriers of location or language
□
Speech recognition in the next 40
years will pass the Turing test
!
Basic learning an decoding algorithms have
not changes substantially in the past 40 years
○
Conclusion:
•
Read: Historical perspective of speech recognition
To animate means "to give life to"
•
In computer animation, animators use
software to draw, model and animate objects
and characters in vast digital landscapes
○
An animator's job is to take a static image or
object and literally bring it to life by giving it
movement and personality
•
The animator draws objects or
characters either by hand or with a
computer
!
Then he positions his creations in key
frames, which form an outline of the
most important movements
!
This process is called tweening
□
Next, the computer uses mathematical
algorithms to fill in the "in-between"
frames
!
Key framing and tweening are
traditional animation techniques that
can be done by hand, but are
accomplished much faster with a
computer
!
Computer-assisted animation is typically
2D, like cartoons
○
This cannot be done with a pencil and
paper
!
Key framing and tweening are still an
important function of computer-
generated animation, but there are
other techniques that don't relate to
traditional animation
!
Using mathematical algorithms,
animators can program objects to
adhere to (or break) physical laws like
gravity, mass and force
!
Or create tremendous herds and flocks
of creatures that appear to act
independently, yet collectively
!
With computer-generated animation,
instead of animating each hair on a
monster's head, the monster's fur is
designed to wave gently in the wind
and lay flat when wet
!
Computer-generated animation is 3D,
meaning that objects and characters are
modeled on a plane with a X, Y and Z axis
○
There are two basic kinds of computer animation:
computer-assisted and computer-generated
•
Animators at Disney revolutionized the
industry with innovations like the use of
sounds in animated short films and the
multi-plane camera stand that created the
parallax effect of background depth
○
Technology has been a long part of the animator's
toolkit
•
Earliest films were scientific simulations
including "Flow of a Viscous Fluid" and
"Propagation of Shock Waves in a Solid
Form"
○
The roots of computer animation began with
computer graphics pioneers in the early 1960s
•
Utah Teapot -rendered 3D teapot that
signaled a turning point in the
photorealistic quality of 3D graphics
!
University of Utah was source of the earliest
important break through in 3D computer
graphics, like the hidden surface algorithm
that allows a computer to conceptualize 3D
objects
○
Ed Catmull (University of Utah) was one of the
first to toy with computer animation as art,
beginning with a 3D rendering of his hand opening
and closing
•
More films in later 1970s and early 1980s
relied on computer graphics (CG) to create
primitive effects
○
"Tron" (1982) was ideal for showcasing
undeniably digital effects
○
"Jurassic Park" (1993) was first feature film
to integrate convincingly real, entirely
computer generated characters into a lice
action film
○
"Toy Story" (1995) from Pixar was first full-
length cartoon made entirely with computer-
generated 3D animation
○
In 1972, "Westworld" became first film to contain
computer-generated 2D graphics
•
Today, a standard desktop computer runs
5000x faster than those used by computer
graphics pioneers in 1960s
○
Cost of the basic technology for creating
computer animation has dropped from
$500,000 to less than $2,000
○
Increasing sophistication and realism of 3D
animation can be directly credited to an
exponential growth in computer processing power
•
Read: How computer animation works
create 3D world inside a computer
•
Can move a camera inside a 3D world to make
characters come to life
•
Moment in lighting where all the pieces
come together and the world comes to life
○
Can use light to help tell story, set mood,
guide the audience eye, make characters
stand out…etc
○
Can add or remove lighting to place it and create
shadows -balancing reality and artistry
•
Mimic physics of water, light,
movement…etc. but not constrained by it
○
Tether ourselves with science and the world we
know as a background -create something relatable
•
Ribbons of light
○
Movement of water
○
Fog beams
○
Can change colour to depict a mood
○
Create a believable world that the audience
can immerse themselves in
○
Ex. Finding Nemo
•
Ex. Light in Walle's "binoculars" to make it
more human-like with emotions
○
Can portray emotion with light
•
Interweaving of art and science
•
Watch: Magic ingredient that brings Pixar movies to life
Augmented reality is the welding of the real world
with computer-generated imagery
•
Latest technology
•
Willing enter fictional world
○
Must suspend our disbelief
•
Deliberately exploit the way the audience thinks
•
Defy logic
•
Watch: Magical tale (with augmented reality)
Explore solutions to solve the climate crisis
○
Could see pictures or videos on a map
!
Can include pictures, interactive images, and
videos that can be expanded
○
Ex. Blowing into mic to see windmill
mechanism
!
Interactive infographics -can explore
○
Runs on ipad and iphone
○
"Our Choice" -first interactive digital book by Al
Gore
•
Watch: Next-generation digital book
Augmented reality (AR) is an enhanced
perception of a physical, real-world
environment such that some elements are
augmented and overlaid by computer-
generated sensory input, such as sounds,
images, video or computer vision
○
The perception may be in real time and may
involve human interactivity
○
Applications may include the Google glass,
realistic medical/military training, and
realistic entertainment
○
What is augmented reality, and some of its
potential uses?
1.
Speech recognition is a collection of
computational techniques that aim at
recognizing or classifying natural human
speech
○
Usually, it processes the speech signal as
data from its coding representation,
segments the code into components,
identifies the components in their context,
and classifies the components into
interpretable human words
○
Each step of the process can present some
difficulties due to possible ambiguity,
uncertainty of the true nature, or confusion
between potential words
○
What is speech recognition and briefly discuss
some of the challenges in its development?
2.
Tweening -technique usually using
computing to generate frames to fill in the
content between two key frames in
animation
a.
Key frame -frame used as a reference point
in computer algorithm for smooth transition
b.
Kinematics -study of methods in
manipulating motion of an object, usually
using a 3D model without considering the
cause of the motion
c.
Briefly define these terms in animation:3.
Questions:
Multimedia Computing
Friday,*March*9,*2018 2:24*PM
Document Summary
Identify and describe some of the approaches used in digital book and computer animation. Examine and evaluate the design and presentation using multimedia techniques. Describe an interactive technique based on speech recognition, its nature, and how it was developed. Multimedia appeals to more of our senses than the traditional computing. It may even appear in daily products in the form of digital books and movies. It may use user interactivities such as speech recognition. Despite the attractiveness and impact of using multimedia, the key is still the presentation of the intended message using these media. Multimedia computing not only involves algorithms that connect and process different types of data, but also involves the creative use of technologies in delivering a story Computer animation is to take static objects and give them life through their movements and personality, and illusory devices that take advantage of a sequence of animated events.