false

Textbook Notes
(369,099)

Canada
(162,378)

McMaster University
(8,734)

Commerce
(1,696)

COMMERCE 2QA3
(19)

Fouzia Baki
(19)

Chapter 2

Description

CHAPTER TWO: DATA
What Are Data?
Transactional data: data collected from recording company’s transactions
Business analytics: any use of statistical analysis to drive business decisions
from data
Data helps businesses predict what customers may buy in future
Help them know mow much of each item to stock
Need to provide context for data values
Must answer Who, What, When, Where, Why, How
Data table:
o Usually the rows answer who
o Column headers answer what
Respondents: individuals who answer survey
Subjects/participants: people experimented on
Experimental units: companies, websites, inanimate subjects that are
experimented on
Variables: characteristics recorded about each individual or case
Relational database: two or more separate data tables that are linked
together so info can be merged across them
Analysis performed on single data table in statistics
Variable Types
Categorical variable:
o When variable names categories and answers questions about how
cases fall into those categories
Quantitative variable:
o When variable has measured numerical values and tells us about the
quantity of what is measured
o Some have units (kg, cm, etc.) and some do not
Classifying variable helps us decide what to do with it
Usually all word variables are categorical
Not all numerical variables are quantitative
Categorical data can be numbers (ex. area codes)
Counts
Counting is a way to summarize a categorical variable
Does not mean that every time something has been counted the variable is
categorical
Also use counts to measure amounts
Measured quantities are associated with quantitative variables
Identifiers Identifier variable: exactly as many categories as there are things being
categorized
Make it possible to combine data from different sources, protect
confidentiality, provide unique labels
Ex. customer number, product ID, social insurance number
Do not need to analyze identifiers because will give not give any info
Variables can play different roles depending on question asked of them
Other Variable Types
Nominal variables: categorical variables used only to name categories
Ordinal values: values that can be ordered (to pick out first, last, middle
value)
Interval scale: measurements for which ratios are not useful (ex.
temperature)
Ratio scale: measurements for which ratios are appropriate
Cross-Sectional and Time Series Data
Time series data: measuring the same variable at different intervals over
time (ex. moths, quarters, years)
Cross-sectional data: several variables measured at the same point
Primary and Secondary Data
Primary: data we collect ourselves (taking survey)
Secondary: data collected by someone else
Where, How and When
Have to know Who, What and Why to analyze data
If possible want to know When and Where as well
How the data are collected matters
Ex. data from voluntary survey on internet are useless because only people
interested in topic will fill out questionnaire
Know three W’s before analyzing data:
o Why you are examining the data (what you want to know)
o Whom each row of data table refers to
o What the variables (the columns of the table) record
CHAPTER THREE: SURVEYS AND SAMPLING
Three Features of Sampling
Feature 1: Examine a Part of the Whole
Select a sample from the population to examine
Small sample can represent the entire population Sample survey: designed to ask questions to a small group of people in the
hope of learning something about the entire population
Have to make sure the sample represents the population fairly
Biased: the summary characteristics of a sample differ from the
corresponding characteristics of the population it is trying to represent
Feature 2: Randomize
Randomizing gives us a representative sample even for effects we were
unsure of
Must make sure that the sample looks like the rest of the population
Two things make randomization fair:
1. Nobody can guess the outcome before it happens
2. Underlying set of outcomes will be equally likely
Randomize because hard to include every possible relevant characteristic of
the population into the sample (income level, age, political affiliation, marital
status, number of children, place of residence)
Can see how well randomizing works by taking two random samples from
population
Data will end up to be almost the same
Sampling variability: the sample-to-sample differences
Feature 3: The Sample Size Is What Matters
Size of the sample determines what can be concluded from the data
Size of population does not matter (ex. do not need a specific percentage or
fraction)
The fraction of the population does not matter – the sample size does
Need a large enough sample so it is representative of population
Most common polling method used by professionals is the telephone
A Census – Does It Make Sense?
Census: an attempt to collect data on the entire population of interest
Reasons why census may not provide best info:
o Difficult to complete census (some people are hard to locate)
o Population being studied may change (babies born, people travel,
people die in time it takes to complete census)
o Takes a lot of effort
Populations and Parameters
Can use models to represent reality
Models of data can give us summaries we can learn from
Parameters: key numbers in models
Population parameter: parameter used in a model for a population
Statistic: anything calculated from a sample
Sample statistic: statistics matched with the parameters they estimate Representative sample: sample form which the statistics computed
accurately reflect the corresponding population parameters
Simple Random Sampling (SRS)
Simple random sample: sample in which each set of n individuals in the
population has an equal chance of selection
Sampling frame: list of individuals from which the sample will be drawn
Easiest way to choose SRS is with random numbers
Assign sequential number to each individual in sampling frame
Other Random Sample Designs
Stratified Sampling
Stratified random sampling: population is divided into several homogeneous
subpopulations (strata) and random samples are drawn from each stratum
Need to make sure proportions of each group within sample match the
proportions of those groups in the population
Ex. if population is 60% women and 40% men, choose sample that is 60%
women and 40% men
Can also stratify by age, race, income, etc.
Biggest benefit of stratified random sampling is it results in reduced
sampling variability
Can find info about individual strata as well as whole population
MUST take samples from every stratum in population
Cannot discard strata because they seem small (will have a large impact in
the end)
Cluster Sampling
Cluster sampling: representative subset of a population chosen for reasons
of convenience, cost, practicality
Split population into clusters and select a few clusters at random to census
Will generate an unbiased sample if each cluster fairly represents population
Stratified vs. cluster sampling:
1. Stratify to ensure sample represents different groups in population
and reduce sample-to-sample variability
Cluster to save money or make study practical
2. Strata are homogeneous but differ from one another
Clusters are mostly alike (Each are heterogeneous and resemble the
overall population)
Systematic Sampling
Systematic sample: sample drawn by selecting individuals systematically
from a sampling frame
Ex. select every 10 employee on an alphabetical list
Start with randomly selected person to ensure it is still random Can give representative sample if order of list is not associated with
responses measured
Much less expensive than SRS
Must be careful that sampling frequency is not related to something about
the process being sampled
Multistage Sampling
Multistage sampling: sampling schemes that combine several sampling
methods
Might be most useful for sampling large amounts of data
Practicalities
1. Who
o Have to be able to define the group you want to survey
o Might have an idea but not know exactly the right people to choose
o Make sure to choose people who’s answers will be meaningful
2. Sampling frame
o Must specify the sampling frame
o Sampling frame limits what survey can find out
3. Target sample
o The individuals for whom you intend to measure responses
o Nonresponse is a problem in most surveys
4. Sample
o Actual respondents
o Individuals you get data from and can make conclusions
o Might not be representative of sampling frame/population
The Valid Survey
Must ask four questions before creating a valid survey:
1) What do I want to know?
o Need to know what you want to learn and from who
o Do not want to ask unnecessary questions
o Longer survey means less people will answer
2) Who are the appropriate respondents?
o

More
Less
Unlock Document

Related notes for COMMERCE 2QA3

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock DocumentJoin OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.