Chapter 2: Data
Collecting data on customers, transactions, and sales lets companies track
inventory and know what their customers prefer.
Transactional data: data collected for recording a company’s transactions.
Data mining (predictive analysis): using data (such as past behaviour of
customers) to make other decisions and predictions.
Business analytics (analytics): any use of statistical analysis to drive business
decisions from data; whether predictive or descriptive.
2.1 What Are Data?
Data: systematically recorded information, whether numbers or labels,
together with its context.
Context: the context ideally tells who was measured, what was measured,
how the data were collected, where the data were collected, and when and
why the study was performed.
Data table: an arrangement of data in which each row represents a case and
each column represents a variable.
o A common place to find the who of the table is the leftmost column.
Cases: individual items listed in the rows of a data table.
Respondents: individuals who answer a survey.
Subjects (participants): people on whom we experiment.
Experimental units: companies, websites, and other inanimate subjects.
Records: rows in a database.
Some people refer to data values as observations.
Variables: the characteristics recorded about each individual or case.
o Usually the columns of a data table.
If # of cases < # of variables, rows and columns can be interchanged.
Spreadsheet: a general term for a data table.
o Great for relatively small data sets.
Relational database: two or more separate data tables are linked so that
information can be merged across them.
o I.e. looking up a customers name to see what they bought or looking
up a product to see who bought it.
In statistics, all analyses are performed on a single data table.
2.2 Variable Types
Categorical variable: a variable that names categories and answers questions
about how cases fall into those categories.
o Special case of categorical variables is one that has only two possible
Quantitative variable: a variable that has measured numerical value and the
variable tells us about the quantity of what is measured.
Units: something that tells how each value has been measured.