Chapter 2

# Chapter 2.docx

Department
Statistical Sciences
Course Code
STA220H1
Professor
Augustin Vukov

Chapter 2
The five W’s: WHO, WHAT, WHEN, WHERE, and (if possible) WHY…and HOW
We must know at least the Who, What and Why to be able to say anything useful based
on the data. The Who are the cases. The What are the variables. A variable gives
information about each of the cases. The Why helps us decided which way to treat the
variables.
We treat variables in two different ways, as categorical or quantitative:
Categorical (qualitative) variables identify a category for each case.
Quantitative variables record measurements or amounts of something; they must have
units.
Sometimes we treat a variable as categorical or quantitative depending on how we want
to learn from it.
Terms:
Data: systematically recorded information, whether numbers or labels, together with its
context
Context: ideally tells WHO was measured, WHAT was measured, HOW the data was
collected, WHERE the data was collected, and WHEN and WHY the study was
performed
Data table: an arrangement of data. Each row represents a case and each column
represents a variable
Case: an individual about whom or which we have data
Variable: a variable holds information about the same characteristic for many cases
Unit: a quantity or amount that is a standard measurement (grams, dollars, hours)
Categorical variable: a variable that names categories (words or numbers).
Quantitative variable: a variable in which the numbers act as numerical values.
Quantitative variables always have units
Identifier variable: a variable holding a unique name (ID number) or other
identification.

