Populations and Samples (section 1.1)
What is a population?
• It is a set of all units that you are interested in studying
• Units = people OR objects OR events
All undergraduates at UWO are about 26000.
All consumers who bought a cell phone last year.
Patients waiting in line for a family doctor.
You are usually interested in studying a certain characteristic of the population
• Any particular characteristic is called a variable
Height, weight, ago, textbook expenses
Cellphone age distribution, monthly usage, brand, satisfaction level
Two Types of Variables
1. Quantitative Variable
• This is a variable that is assigned a meaningful numerical value
E.G. Height, weight, income, waiting time
The types of quantitative variables:
i. Interval: The distances between points are fixed and meaningful.
ii. Ratio: When a variable has a meaningful zero and equal distances between points,.
2. Qualitative (Categorical) Variable
• This is a variable where the characteristic can assign to different categories
E.G. Brand of cellphone, monthly usage (high, medium, low), satisfaction level
i. A nominative variable is used for categorizing only and has no meaningful order.
ii. An ordinal variable is ranked in order that have more meaning than at the nominative level.
The levels of measurement in order from the simplest level of measurement to the most complex:
If we examine every unit of the population (for the variable of interest), we say we are conducting a census of
• As you can imagine, many populations are too large to study
• It would be too time consuming OR too costly to conduct a census
Thus, it makes more sense to select and analyze a subset (or portion) of the population
This subset is called a sample
Once you have selected a sample from the population you wish to study, you will want to begin your analysis of
the data by first describing the sample data:
• Graph the sample data
• Calculate some numbers that summarize the data Two Main Types of Analysis
1. Descriptive analysis
a) Population is known
b) Take a sample from the population
c) Using probability to assume how the sample should behave
2. Statistical inference
a) Population is unknown
b) Take a sample from the population.
c) Use the sample’s characteristics to infer (conclude) about the population
Sampling From A Population (section 1.2)
In order to be able to draw valid conclusions about a population, your sample needs to be taken in such a way
that it is representative of the population from which it is drawn
The best way to achieve this is by taking a random sample
A random sample (r.s.) is a sample of data that is selected so that each unit in the population is equally likely to
You can sample with OR without replacement
After sampling a unit, you record the value of your variable, then return the unit back to the population (and
thus is can conceivably be chosen again on a succeeding selection)
We do not place the unit chosen back into the population (this way, it can not be chosen again)
If the population is large (which it is in most situations), sampling without replacement will still give you a
• Thus, most of the time, random sampling will be done without replacement
How Do You Select a Random Sample?
1. In theory, you would give every unit in your population a number
• Put the numbers in a “hat” or “bowl”
• Mix the numbers up
• Pull numbers out of the hat or bowl (mixing them up after each selection)
2. Use a random number table OR a computer program that has a random number generator
• Again, the units of the population need to be numbered (this creates a list that is called a frame)
• A random number table is a table of random numbers from 0 to 9, one after the other, row by row
• You can start anywhere in the table and move in any direction Example
There are 171 students registered in this section (this is our population of interest)
Suppose we are interested in marks from first year math courses. Let’s choose a r.s. of 10 students
• Number the students from 1 to 171
• Using the random number table on page 4 of the text; start at Row 5, Column 1 and move from left to right
Using the EXCEL function
Our 10 randomly selected students are:
104, 164, 112, 114, 104,
12, 50, 105, 155, 158
If the population is very large, it is hard to come up with a list or frame, of all the population of units (needed
because we must be able to number the units)
In these situations, we can select what is referred to as a systematic sample
1. A systematic sampling plan employs a methodical, but non-random approach, such as selecting units at
regularly space intervals on a list
1. Go to UCC, select every 25 student who passes a particular spot; then ask them to participate in a
survey about textbook spending
2. Go to a mall and ask every 100 shopper to participate in a survey about cell phone usage
The results are not technically random samples, but the goal is to select a sample that is a good representation
(or cross section) of the population
When doing data analysis, you need to use a random sample (or an approximate r.s.)
You need to avoid using voluntary response samples
2. In these samples, participants self-select. (Voluntary Bias)
3. That is, whoever chooses to participate does so (that is, they have not been selected by the researcher)
4. These participants tend to have strong views and/or opinions, either in the positive or negative.
Sampling a Process (section 1.3)
A population is not always a set of existing units
We are often interested in studying the population of all of the units that will be or could potentially be
produced by a process
A process is a sequence of operations that takes inputs (such as labour, materials, methods, machines and so on)
and turns them into outputs (products, services and so on) • Processes produce output over time
• For example, we may be interested