Weiss, N Lecture Notes
What is statistics?
Here, statistics is a group of methods used to collect, analyse, present, and
interpret data and to make decisions.
Descriptive: methods to view a given dataset. e.g. averages, histograms ,pie
charts, bar graphs, mean , mode, median, standard deviation, variance….
Inferential: methods using sample results to infer conclusions about a larger
population. e.g. 2 sample t-tests, simple linear regression
(i) A population consists of all elements whose characteristics are being studied.
e.g. GPA of all Grant MacEwan students , Canadian Census
(ii) A sample is a portion of the population selected for study.
e.g. GPA of 10 Stats151 Grant MacEwan students in this section
(iii) A representative sample is a sample that represents the characteristics of the
population as close as possible.
A random sample is a sample drawn in such a way that each element of the
population has an equal chance of being selected.
If chances are all the same SRS(simple random sample)
- e.g. A deck of cards: picking a red card is a simple random sample. Moreover, placing the card back in the deck is a sample with replacement and
Otherwise, there is sampling without replacement.
● an element/member of a sample or population is a specific subject or object
about which information is collected
● a variable is a characteristic under study that assumes different values for
● the value of a variable is called an observation
● a data set is a collection of observations on one or more variables
City Number of dog bites
Center City 47
Elm Grove 32
Bay City 44
Sand point 3
• Member: Each city included in the table
• Variable: Number of dog bites reported
• Measurement: Number of dog bites in a specific city
• Data set: Collection of dog bite numbers for the six cities listed in the
table. In an Observational study researchers simply observe characteristics and take
In a design of experiment, researchers impose treatments and controls and then
observe characteristics and take measurements. Example:
● Quantitative variable: variable which can be measured numerically.
Discrete variable: a quantitative variable whose possible values can be listed
Continuous variable: a quantitative variable whose possible value form some
interval of numbers. e.g.
●Qualitative (categorical) variable: a nonnumerically valued variable.
Histogram, Pie chart, bar graph, stem-and-leaf plots:
A frequency distribution lists all categories and the # of elements that belong to
each of the categories.
Relative frequency of a category =
Bar Graph : a graph representing the frequencies of respective categories Pie Chart: a circle divided into proportions representing the percentage relative
Stem-and-leaf plot: To prepare a stem-and-leaf display for a data set, each value
is divided into two parts; the first part is called the stem and the second part is
called the leaf. The stems are written on the left side of a vertical line and the
leaves for each stem are written on the right side of the vertical line next to the
EXAMPLE: the following data shows the method of payment by 16 customers in
a supermarket checkout line(C=cash, CK=check, CC=credit card, D=debit,
C CK CK C CC D O
CK CC D CC C CK CK
Plots: MINITAB (bar graph): graphchart x(variable),
Example 2.71: The number of patents a university receives is an indicator of the
research level of the university. The number of patents awarded to a sample of 36
private and public universities was found to be:
93 27 11 30 9 30 35 20 9 35 24 19 14
29 11 2 55 15 35 2 15 4 16 79 16 22
49 3 69 23 18 41 11 7 34 16
Construct a stem-and-leaf plot for these data with: (a) one line per stem , (b) two
lines per stem, (c) which do you find more useful? Why? Outliers: values that are very small or very large relative to the majority of the
values in the data set.
Dot plot: In order to prepare a dotplot, first we draw a horizontal line with
numbers that cover the given data set. Then we place a dot above the value on the
number line that represents each measurement in the data set
Example: the following data give the number of times each of the 20 randomly
selected male students from MacEwan ate at fast-food restaurants during a 7-day
5 8 10 3 5 5 10
7 2 1 10 4 5 0
10 1 2 8 3 5
Dotplot & Histogram:
Distribution of a data set is a table, graph, or formula that provides the values of
the observations and how often they occur.
Shapes of distributions: Sections 3.1-3.4
Mean of a data set = (sum of all values) / (number of values)
N = population size, n =sample size
population mean= μ = (∑x)/N sample mean= x = (∑x)/n
Median: middle value of a ranked/ordered data set
Mode: the value that occurs with the highest frequency in a data set
Remarks on mean, mode, median: i) if mean=median=mode data is symmetric
ii) if mean > median data is right-skewed
iii) if mean< median data is left-skewed
Example : The number of casinos in 11 states as of Dec.21, 2003 are for:
CO IL IN IA LA MI MS MO NV NJ SD
44 9 10 13 18 3 29 11 256 12 38
i) Find the mean and median.
ii) Do th