Class Notes (1,100,000)

CA (620,000)

UBC (10,000)

COMM (700)

COMM 291 (100)

Jonathan Berkowitz (100)

Lecture 2

Department

CommerceCourse Code

COMM 291Professor

Jonathan BerkowitzLecture

2This

**preview**shows pages 1-2. to view the full**8 pages of the document.**COMMERCE 291 – Lecture Notes 2016 – © Jonathan Berkowitz

Not to be copied, used, or revised without explicit written permission from the copyright owner.

Summary of Lectures 3 and 4

Chapter 3: Surveys and Sampling

Three Principles of Sampling:

1. Examine part of the whole

Sampling means “take a subset (i.e. a sample) of a larger whole population and use the

information about the sample to give information about the population.”

A sample is different from a census, which is a collection of data on all individuals in the

population. Compared with a sample, a census is generally harder to complete, more

costly, more time-consuming, cumbersome to carry out, and useless when destructive

testing is involved.

A biased sample is one that has a systematic favouring of certain outcomes. Bias can

arise from many sources, including poor wording of the question, undercoverage

(leaving out subgroups of the population), non-response bias (individuals chosen for the

sample refuse to participate), and response bias (attitude or behaviour of the interviewer

or respondent, or poor memory for events being asked about, etc.).

2. Randomize

Use a "chance" process to prevent bias.

Every sample is different: the difference as are called sampling error or sampling

variability. Note that "error" does not mean "mistake"!

3. The Size of the Sample is what matters

Sample size determines the generalizability of our conclusions, not the population size

(as long as it is a large population).

Terminology:

Population – entire class of individuals which have a common characteristic of interest

that the investigator would like to generalize about

Sampling Frame – the portion of the population you can access, and can sample from

Sample – a subset (randomly chosen and representative) which will represent the

population

Parameter – a numerical fact or characteristic about the population of interest

Statistic/Estimate – a number computed from the sample (i.e. a quantity analogous to

the parameter of the population), which is used to estimate a parameter

A “clever” mnemonic device:

Population Parameter

Sample Statistic/Estimate

Note that Population and Parameter both start with “P”; Sample and Statistic both start

with “S”. And the brilliant part is that even the first syllable of Estimate is pronounced “S”!

1

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Example 1: Suppose you are interested in the mean household income of all Canadian

households. The population is all Canadian households; the parameter is the mean

household income. Only CRA has this information! However, a survey research firm

contacts 500 households at random. These 500 households form the sample; the mean

household income of the 500 households is the statistic/estimate.

Example 2: Forecasting Elections – two possible parameters are: a) average age of

eligible voters; and, b) percentage of eligible voters who actually voted. The population is

“all eligible voters”; a polling company would take a sample of, say, 1000 voters.

How good the estimates of the population parameters the sample statistics are depends

on how representative the samples are. And the best methods involve randomization.

Sampling Designs:

Simple Random Sampling (SRS) – every unit in the population has the same

chance of being chosen for the sample

Stratified Random Sampling – divide the population into homogeneous

subgroups called strata and select a SRS in each stratum. This has the benefit of

reducing sampling variability.

Cluster Random Sampling – divide the population into parts or clusters each of

which represents the population. Select a few clusters at random and do a

census within each one. If each cluster is fairly representative, the sample will be

unbiased.

Multi-stage Cluster Sampling – repeatedly divide the population into smaller and

smaller subgroups and, at each stage, use a chance procedure to pick the

sample. This combines SRS and cluster sampling at multiple levels

Systematic Sampling – approximate a SRS by taking units at regular intervals

through the population, but with a random starting point.

Note the difference between strata and clusters: strata are each homogeneous and

different from one another; clusters are heterogeneous and similar to one another.

Poor samples are those drawn by convenience sampling (choose respondents that are

easiest – that is, most convenient – to reach, such as family, friends, and neighbours).

Poor samples are also those drawn by voluntary response, where respondents choose

themselves. They are likely to be biased samples because people with strong and

usually negative opinions are most likely to respond (e.g. open-line talk radio shows).

Beware of anecdotal evidence, based on haphazardly selected cases which are not

representation (e.g. call-in shows on talk radio, newspaper advice columns)

Observational studies vs. Experimental studies

In an observational study, the experimenter does not try to influence responses but

simply measures variables of interest for a group of individuals. This is what surveys are.

In an experiment, the experimenter imposes a treatment or intervention on individuals

and studies their responses. Experiments are used to control for confounding factors

(i.e. “lurking” variables) while comparing groups.

2

###### You're Reading a Preview

Unlock to view full version

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Chapter 4: Displaying and Describing Categorical Data

Recall that a variable is a characteristic of an individual, and a variable can take different

values for different individuals. When we speak of the “distribution” of a variable, we

mean, “What are the values of the variable and how often does each value occur?".

Distributions are best displayed in tables, charts, and graphs. All provide compact and

visually appealing ways of summarizing variables or data. Every statistical analysis of

data should include graphical summaries. We emphasize this as follows:

Three Rules of Data Analysis:

Rule 1. Make a picture

Rule 2. Make a picture

Rule 3. Make a picture

These pictures give good general impressions of the data, the distribution of the

variable, trends, possible outliers, and later on, relationships among variables. They also

suggest which numerical summaries will be useful.

Frequency Table:

A table of counts (i.e. frequencies) of each of the values of a categorical variable,

together with relative frequencies (also called proportions) or percents.

Example: 20 students’ response to a multiple choice question with 5 possible answers

D A B A A D A C D D A A B A A A E E C A

Values Frequency Relative Frequency Percentage

A 10 .50 50%

B 2 .10 10%

C 2 .10 10%

D 4 .20 20%

E 2 .10 10%

Total 20 1.00 100%

One useful type of graph of a frequency table for a categorical variable is a “bar graph”

or “bar chart”. Here is a bar chart (created in Excel) for the above frequency table.

3

###### You're Reading a Preview

Unlock to view full version