MIS 0855 Study Guide - Midterm Guide: Knowledge Extraction, Eli Pariser, Data Visualization

42 views10 pages
Quiz Questions
Quiz 1 (2.1 Readings)
o Stein developed a model that could determine the gender of a caller using: his
phone records from Google Voice
o Stein was able to eventually predict the gender of a caller: 80% of the time
o Aodig to Di Justos atile, telepho etadata iludes: the alls duatio
o The Ashley Madison hack is different from previous hacks in that: it resulted in more
personal damage to users
o Aodig to “iles atile What the Fo Kos, the eplaatio step ioles:
aseig the h ad ho
o Characteristics of open data are: it can be redistributed to others, it can be available
for free, and it can come from any source
o FieThitEights seah fo Aeias est uito ega ith data fo: Yelp
Quiz 2 (3.1 Readings)
o According to Hayes, a benefit of large samples is that: it minimizes sampling error
o Aodig to Cafod, a ke pole of Bostos “teetBup app is that: lo
income residents have less access to smartphones
o What does Crawford propose was the reaso fo Googles oeestiatio o flu
outbreaks? Media coverage of the flu season
o According to Hayes, what percentage of business leaders do not trust the
information they use to make decisions? 33%
o In the article by Weisburg, Eli Pariser argues that the Filter Bubble is caused by:
personalization of web content
Quiz 3 (4.1 Readings)
o According to Unwin the reason for using graphic displays is: to present or explore
data
o According to Unwin, one issue with map-based graphical visualizations is that:
distance is not directly related to similarity
o Aodig to Ui, a sale is eall ie if it: iludes 
o Aodig to Hoe, hih is NOT oe of Fes 8 oe piiples of data
isualizatios? Eplai
o According to Achido, Microsoft uses all of the following data to combat cybercrime
except: FBI watchlists
Quiz 4 (5.1 Readings)
o I Matlis atile, Whog states that his NYC Tai Ca isualizatio is pat of a lage
movement for: open data and transparency
o Aodig to Daepot, the essee of aaltial communication includes all
except: the computer software
o According to Davenport, what is an example of something that should NOT be
included while storytelling with data: sequence of activities used in the analysis
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in
o According to Krum, the fact that people remember messages with images more
often than ones with just text is called: The Picture Superiority Effect
o According to Krum, the relationship between infographics and data visualizations is
best described as: infographics can include data visualizations within them
o According to Krum, good infographics should: make sure the relative size of chart
elements are proportional to the data values
Data and Science Prediction (Dhar)
Intro
o Data science aka big data
o Imply a focus involving data and statistics or the systematic study of the
organization, properties, and analysis of data
o Raw data is unstructured, often emanating from networks with complex
relationships between entities
o Computers do background work for each other and make decisions automatically
o Traditional database methods are not suited for knowledge discovery because
thee optiized fo fast aess ad suaizatio of data
o What makes an insight actionable? Its predictive power because of past data
o KDD knowledge discovery in databases
o Model must be predictive
Implications
o Data is stored because it could be valuable in the future
o Using large amounts of data to make decisions became practical in 1980s
o Machine learning works in the sense that methods detect subtle structure in data
o Downside is that methods pick up noise in data and cannot distinguish between
signal and noise
o Data ases so lage ou dot ee ko hee to stat lookig
o Feature construction is an important step in knowledge discovery
o Whe data is lage ad ultidiesioal, its alost impossible to know a priori that
a query is a good one
o Copute a uild peditie odels though a itelliget geeate ad test
process
Skills
o Mahie leaig skills usig i todas aketplae
o Koledge of tet poessig ad tet iig is eoming essential in
unstructured data
o Knowledge about markup languages is also essential as content is able to be
interpreted by computers
o Data scientists need skills in:
Bayesian statistics (probability, distributions, hypothesis testing,
multivariate analysis)
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in
Computer science (pertains to how data is internally represented and
manipulated by computers)
Knowledge about correlation and causation
Ability to formulate problems that lead to effective solutions
Knowledge Discovery
o All odels ae og, ut soe ae useful, ad soeties e dot eed the at
all
o Need to know how observational data was generated before we can draw
connections
o Possibility of extracting casual model from large amounts of data
o Predictive models can be made and can be used to help make decisions
o Errors in predictions come from three sources:
Misspecification of a model
Sample used for estimating parameters
Randomness
o Big data allows data scientists to reduce the first two types of error
o One of the most far-reaching modern applications of big data is in politics
o Social science theory building is likely to get a boost from big data and machine
learning
Three Science Words We Should Stop Using (Allain)
Hypothesis
o Means the basis of an argument (does NOT mean a guess)
o Allain thinks the best current use of the word is the testable predictions from an
idea
Theory
o It is a scientific idea
Scientific Law
o Law is more like a generalization
One word to replace them all
o Use the word: model
o Science is all about making models
o Physical model: globe physical model of Earth
o Mathematical model: equation explains an idea
o Conceptual model
I’ Beatig the NSA to the Puh y Spyig o Myself Stei
Can predict the gender of a caller 80% of the time based on time of day and length of call
First criticis: statig he odels %-accuracy is useless because he reported no baseline
“eod itiis: didt do eough aalsis o ho good his odel is
o Timestamp of call did very little to effect the accuracy of the model
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 10 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Its predictive power because of past data: kdd knowledge discovery in databases, model must be predictive. Three science words we should stop using (allain: hypothesis, means the basis of an argument (does not mean a guess, allain thinks the best current use of the word is the testable predictions from an idea, theory. Imei can be used to identify malfunctioning, obsolete, or stolen equipment, or a ban from the network: a pho(cid:374)e(cid:859)s i(cid:374)te(cid:396)(cid:374)atio(cid:374)al mo(cid:271)ile u(cid:271)s(cid:272)(cid:396)i(cid:271)e(cid:396) ide(cid:374)tit(cid:455) (cid:894)im i(cid:895) u(cid:374)i(cid:395)uel(cid:455) ide(cid:374)tifies a use(cid:396) by using the. 22 million government employees: sony hack took down loads of employees, putting their personal lives on public display, having your name released as an ashley madison user could destroy your life. What the fox knows (silver: election forecasts stood out in media; others were forecasting that romney would win, fo(cid:396)e(cid:272)asts (cid:449)e(cid:396)e o(cid:448)e(cid:396)(cid:396)ated (cid:271)e(cid:272)ause the(cid:455) did(cid:374)(cid:859)t (cid:396)ep(cid:396)ese(cid:374)t the totalit(cid:455) of the jou(cid:396)(cid:374)alis(cid:373) at.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers