Statistics consists of two parts:
1. Descriptive statistics
b. Calculation to summarize date (mean, median, percentile)
2. Inferential statistics
a. we'll be dealing with
b. to make decisions and predictions about a population
i. by generalizing the facts we learned from a sample
Data are useless without their context.
• Without context, we can't understand what the data are about
• The Who,What,Why,Where,When and How of Data
The rows of a data table correspond to individual cases about Whom we record some
• common place to ﬁnd who is the leftmost column
• "cases"can also be
• respondents–individuals who answer a survey
• subjects or participants –people in an experiment
• experimental units–animals, plants, websites, or other inanimate objects
• Question from hwk: The food retailer wants to determine the best store location in Texas,
and will examine census data from existing stores
• Who was measured? existing stores not stores in Texas!!!
• What - what has been recorded-variables -e.g.Area code, price, brand name
• When data are collected can be important. Data that are decades old may mean something
different than similar values recorded last year.
• data collected b4 Obama speech can't be applied after it-people change mind • Where data are collected can be important. Data collected in Mexico may differ in meaning
than data collected in the United States.
• How data are collected can make the difference between insight and nonsense.
• bias: data collected in 3am / people who dislike school ﬁll out the survey
• Kelsey: "people you found is systematically different from the others"
• data came from online voluntary survey are almost always worthless
Relationship between Variables and data
• Variable: The (varying) characteristics recorded about each individual or case-What has been
• Age, Sex, Major,…
• Data: the value of the variables
• 20, Male, English,…
• When a variable names categories and answers questions about how cases fall into those
categories, it is called a cat