Ch 2: Data
Recall: Data – can be numbers, record names, or other labels.
- Not all data represented by numbers are numerical data (eg. 1 =
male, 2 = female)
- It must have the “five W’s”: Who, What, When, Where, and
Who: Who are you interested in? Who is in the target population?
- Subjects or participants – people on whom we experiment
o The entire set of subjects is the population.
o The set of subjects you observe is your sample.
- Respondents – individuals who answer a survey
- Experimental units – animals, plants, and inanimate subjects
What: what characteristic (or variable) are you measuring?
- Variables are characteristics recorded about each individual
o A variable can take different values for different
o Some variables have units that tell how each value has
been measured and tell the scale of the measurement
1 of 6 Types of Variables
1. A categorical variable places a subject into one of several
groups or categories (or levels).
Usually we determine the counts of cases that fall into
2 of 6 Two types:
i. Nominal: the levels have no order
ii. Ordinal: the levels have some order
a) gender (M or F)
b)hair color (blonde, white, black, red, etc…)
c) nationality (Canadian, American, German, French, Chinese,
d)letter grade (A+, A, A-, B+, B, B-, C+, C, C-, D+, D, F)
e) car manufacturer (Dodge, Honda, Ford, Others)
f) opinion (strongly agree, agree, neutral, disagree, strongly
g)education level (high school diploma, undergraduate degree,
2. A quantitative variable measures a numerical quantity or
amount in each subject.
i. Discrete: can only take on distinct values
ii. Continuous: can take on any value in a given
3 of 6 Example 2: A medical study.
Data from a medical study contain values of many variables of each
of the people who were the subjects of the study. Which of the
following variables are categorical and which are quantitative?
a) Age (years)
b) Race (Asian, black, white, or other)
c) Smoker (yes or no)
d) Systolic blood pressure (mill