Class Notes (1,100,000)
CA (620,000)
UBC (10,000)
COMM (700)
COMM 291 (100)
Lecture 2

COMM 291 Lecture Notes - Lecture 2: Pie Chart, Bar Chart, Cluster Sampling


Department
Commerce
Course Code
COMM 291
Professor
Jonathan Berkowitz
Lecture
2

This preview shows pages 1-2. to view the full 8 pages of the document.
COMMERCE 291 – Lecture Notes 2016 – © Jonathan Berkowitz
Not to be copied, used, or revised without explicit written permission from the copyright owner.
Summary of Lectures 3 and 4
Chapter 3: Surveys and Sampling
Three Principles of Sampling:
1. Examine part of the whole
Sampling means “take a subset (i.e. a sample) of a larger whole population and use the
information about the sample to give information about the population.”
A sample is different from a census, which is a collection of data on all individuals in the
population. Compared with a sample, a census is generally harder to complete, more
costly, more time-consuming, cumbersome to carry out, and useless when destructive
testing is involved.
A biased sample is one that has a systematic favouring of certain outcomes. Bias can
arise from many sources, including poor wording of the question, undercoverage
(leaving out subgroups of the population), non-response bias (individuals chosen for the
sample refuse to participate), and response bias (attitude or behaviour of the interviewer
or respondent, or poor memory for events being asked about, etc.).
2. Randomize
Use a "chance" process to prevent bias.
Every sample is different: the difference as are called sampling error or sampling
variability. Note that "error" does not mean "mistake"!
3. The Size of the Sample is what matters
Sample size determines the generalizability of our conclusions, not the population size
(as long as it is a large population).
Terminology:
Population – entire class of individuals which have a common characteristic of interest
that the investigator would like to generalize about
Sampling Frame – the portion of the population you can access, and can sample from
Sample – a subset (randomly chosen and representative) which will represent the
population
Parameter – a numerical fact or characteristic about the population of interest
Statistic/Estimate – a number computed from the sample (i.e. a quantity analogous to
the parameter of the population), which is used to estimate a parameter
A “clever” mnemonic device:
Population  Parameter
Sample  Statistic/Estimate
Note that Population and Parameter both start with “P”; Sample and Statistic both start
with “S”. And the brilliant part is that even the first syllable of Estimate is pronounced “S”!
1

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Example 1: Suppose you are interested in the mean household income of all Canadian
households. The population is all Canadian households; the parameter is the mean
household income. Only CRA has this information! However, a survey research firm
contacts 500 households at random. These 500 households form the sample; the mean
household income of the 500 households is the statistic/estimate.
Example 2: Forecasting Elections – two possible parameters are: a) average age of
eligible voters; and, b) percentage of eligible voters who actually voted. The population is
“all eligible voters”; a polling company would take a sample of, say, 1000 voters.
How good the estimates of the population parameters the sample statistics are depends
on how representative the samples are. And the best methods involve randomization.
Sampling Designs:
Simple Random Sampling (SRS) – every unit in the population has the same
chance of being chosen for the sample
Stratified Random Sampling – divide the population into homogeneous
subgroups called strata and select a SRS in each stratum. This has the benefit of
reducing sampling variability.
Cluster Random Sampling – divide the population into parts or clusters each of
which represents the population. Select a few clusters at random and do a
census within each one. If each cluster is fairly representative, the sample will be
unbiased.
Multi-stage Cluster Sampling – repeatedly divide the population into smaller and
smaller subgroups and, at each stage, use a chance procedure to pick the
sample. This combines SRS and cluster sampling at multiple levels
Systematic Sampling – approximate a SRS by taking units at regular intervals
through the population, but with a random starting point.
Note the difference between strata and clusters: strata are each homogeneous and
different from one another; clusters are heterogeneous and similar to one another.
Poor samples are those drawn by convenience sampling (choose respondents that are
easiest – that is, most convenient – to reach, such as family, friends, and neighbours).
Poor samples are also those drawn by voluntary response, where respondents choose
themselves. They are likely to be biased samples because people with strong and
usually negative opinions are most likely to respond (e.g. open-line talk radio shows).
Beware of anecdotal evidence, based on haphazardly selected cases which are not
representation (e.g. call-in shows on talk radio, newspaper advice columns)
Observational studies vs. Experimental studies
In an observational study, the experimenter does not try to influence responses but
simply measures variables of interest for a group of individuals. This is what surveys are.
In an experiment, the experimenter imposes a treatment or intervention on individuals
and studies their responses. Experiments are used to control for confounding factors
(i.e. “lurking” variables) while comparing groups.
2
You're Reading a Preview

Unlock to view full version

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Chapter 4: Displaying and Describing Categorical Data
Recall that a variable is a characteristic of an individual, and a variable can take different
values for different individuals. When we speak of the “distribution” of a variable, we
mean, “What are the values of the variable and how often does each value occur?".
Distributions are best displayed in tables, charts, and graphs. All provide compact and
visually appealing ways of summarizing variables or data. Every statistical analysis of
data should include graphical summaries. We emphasize this as follows:
Three Rules of Data Analysis:
Rule 1. Make a picture
Rule 2. Make a picture
Rule 3. Make a picture
These pictures give good general impressions of the data, the distribution of the
variable, trends, possible outliers, and later on, relationships among variables. They also
suggest which numerical summaries will be useful.
Frequency Table:
A table of counts (i.e. frequencies) of each of the values of a categorical variable,
together with relative frequencies (also called proportions) or percents.
Example: 20 students’ response to a multiple choice question with 5 possible answers
D A B A A D A C D D A A B A A A E E C A
Values Frequency Relative Frequency Percentage
A 10 .50 50%
B 2 .10 10%
C 2 .10 10%
D 4 .20 20%
E 2 .10 10%
Total 20 1.00 100%
One useful type of graph of a frequency table for a categorical variable is a “bar graph”
or “bar chart”. Here is a bar chart (created in Excel) for the above frequency table.
3
You're Reading a Preview

Unlock to view full version