MIS 0855 Study Guide - Spring 2018, Comprehensive Midterm Notes - Ford Focus, Big Data, Relational Database
MIS 0855
MIDTERM EXAM
STUDY GUIDE
Fall 2018
Introduction: Data, Information, Knowledge (1/16/18)
• Data: raw, unorganized facts
• Information: data that is processed to be useful
• Knowledge: application of data and information
• What is information?
o Insight derived from data
o Data presented in a meaningful context
o Data processed by summing, ordering, etc
o A difference that makes a difference
• Data can be presented with percentages
• Information is a step further than data; the analysis may consist of a mean, median,
minimum, maximum, etc
What is Big Data? Is it a Big Deal?
• Velocity, Variety, Volume
• Data ad Big Data do’t really atter uless you ca tur the ito iforatio ad
knowledge
• As a manager, your role is to:
o Examine the assumptions, approaches, and data carefully.
o Ask analysts:
▪ Can you tell me something about the source of data you used in your
analysis?
▪ Are you sure the sample data are representative of the population?
▪ Are there any outliers in your data distribution? How did they affect the
results?
▪ What assumptions are behind your analysis?
▪ Are there any conditions that would make your assumptions invalid?
find more resources at oneclass.com
find more resources at oneclass.com
Science and Data Science: What is data science?
• Compare it to the definition of science: knowledge about or study of the natural world
based on facts learned through experiments and observation
• What makes knowledge actionable? Why is that a goal? How does big data facilitate
this?
o Actionable – needs to project into the future, needs to be generalizable and
robust
• First: Statistics
o What is statistics? – Statistics studies data in terms of collection, analysis,
interpretation, presentation, and organization
o It helps us to answer these questions:
▪ What patterns are there in my data?
▪ What is the chance that an event will occur?
▪ Which patterns are significant?
▪ What is a high level summary of my data?
• Now: Big Data & Machine Learning
o What is machine learning (ML)? – ML gives computers the ability to learn
without being explicitly programmed
o A computer program is said to learn from experience E with respect to same task
T and some performance measure P, if its performance on T, as measured by P,
improves with experience E.
▪ T: playing checkers
▪ P: percentage of games won against an arbitrary opponent
▪ E: playing practice games against itself
• Statistics vs. ML (Breiman2001)
o Input x → Nature → Output y
o Why analyze data? – To predict or extract information
o Statistics: input x → linear reg, logistic reg, cox → output y
o ML: input x → unknowns → output y
▪ Figure out unknowns with Decision Trees or Neural Nets
• The dangers of (big data) analytics
o It’s easy to fid hat’s ot really there
o The direction of causality can be tricky
o Dirty data is eeryhere
• “o…“tart ith a hypothesis
o The testale preditios fro a idea ith a uderlyig ratioale that akes
sense
find more resources at oneclass.com
find more resources at oneclass.com