1.2-1.4: Some Terminology
The (target) population consists of all the subjectsthat are beingstudied.(ex: Canadians,mac
A sampleis a groupof subjectsselected frompopulation.
DescriptiveStatistics (Ch.2) consists of summarizingand presentingdata. (ex: average marksof a
test, or a histogramof the marks)
InferentialStatistics (Ch. 6 on) consists of usinga sampleto makea conclusionabouta population
usingprobability (Ch.3-5)(ex: taking a opinionpoll of a sampleof a populationand makingan
inferenceof the wholepopulationbasedon that)
- Largersampleslead to moreaccurate estimate of inferentialstatistics
The sampledpopulation is the populationfromwhich the sampleis drawn
- Ex: Sampledpopulationdependson how you collect the sample(if you were interestedin
Canadians,butonly chosefrom peoplein Hamilton)
- Ex2: If I am interestedin the averageage of Mac Students,then the target populationis all
Mac students.If I take a sampleof 30 studentsfrom class,then the sampledpopulationis
everyonein this class.
The firstcolumn showsthe first number(4 in 40,5 in 50)
The secondcolumnonly showsthe ones(1 in 41,5 in 55) 3 7 25
- The numbers from the data set example 1 8 7
Definition: the median of an ordereddata set with n observationsis the middlevalue if n is odd, and the
averageof the middletwo values if n is even
e.g 5,8,10,21,25(n=5 numbers)
Median = (15+18)/2=16.5
Example: 2.8,2.9,3.0,3.2,5.4,6.7,6.9 Leaf unit usedto give you the originalvalue
Inclass Notes Page 2 Leaf unit can be negative.
Note the median cannotbe splitbetween two differentrows,it is always one numberand rows do not
overlap sincewe usethe conventionalmethod
2.4 Measuresof CentralTendencies
The averageof a sample(x1,x2,….,xn)is called the samplemean or mean and is denoted by
The averageof the populationxq, x2, xN is called populationmean and is denotedby
Can’tcalculate populationmean,but you can estimate since we don'thave the actual population
Note: that N is the populationsize.n is the samplesize.
The mode of a data set is the value that occursmostoften
e.g.: 7,8,8,8,9,10 mode=8
e.g.: 7,8,8,8,9,9,9,10,10,11mode= 8 and 9
In a groupeddata set, the modal class is the groupwith the largestfrequency
60-70 3 Modal class
Inclass Notes Page 3 Lecture 3
2.5: Measures of Variation
The range R of a data set is the largest value (X ) mLnus the smallest value (X ) s
i.e., R= XL- Xs
The sample variance (or variance) of a data set x1, x2,-,xn is defined by
The population variance of a population of values x ,x ,…1x 2s defNned by:
Example: 55, 63, 72, 41, 87, 75, 64, 60
x= 64.625 (NEVER ROUND if needed for another calculation!)
S = (55-64.625) + (63-64.625) +…+ (60-64.625) /(8-1) 2
S = 55 +63 +…+60 -8(64.625) /(8-1)
*** do not use a rounded x (bar) or average for the s squared equation ***
The sample standard deviation (or standard deviation) is defined to be:
Inclass Notes Page 4 The sample standard deviation (or standard deviation) is defined to be:
In our example: S= (191.125)sqrroot = 13.82, this gives the "average" amount that the data values differ
Population can be the sample itself, but then you wouldn't need statistics
The coefficient of variation is a measure of relative variation that can be used to compare the variation
of two different data sets possibly measured in different units. It is defined by:
No good reason as to why we multiply by 100.
a. Ages in years: 3,4,5,6 x= 4.5, s 1 1.667
b. The same ages in months: 36,48,60,72 x= 54, S = 2422
- Note: same amount of variation (even with different S squared values) because these are similar
Which data set has more variation?
Note: coefficient of variation is independent of the scale of measurement
Percentiles and Quartiles:
The pth percentile is the number x with the property that p percent of the data is less than x.
e.g. The 50th percentile is the median.
The quartiles Q1, Q2,Q3, divide the data set into roughly four equal parts
Inclass Notes Page 5 Inclass Notes Page 6 Inclass Notes Page 7 Inclass Notes Page 8 Lecture 4
Definition: The sample space, S, for an experimentis the set of all possible outcomes.
Ex: In a family of 3 children:
S= (BBB, GBB, BGB, BBG, GGB, GBG, BGG, GGG) Gender Combos
S= (0, 1,2, 3) Possible Boys These possibilities are not equally likely unlike the Gender combos,thus should not be
used as a sample space
Definition: An event, E, is a subset of the sample space that satisfies a given condition.
E= "Exactly one boy"
= (GGB, GBG, BGG)
Basic Rule of Probability
If the outcomesin S are equally likely than the probability of E is
P(E )= # of outcomesin E/# of outcomesin S OR
= # of ways E can occur/Total# of possible outcomes
P(Exactlyone boy)= 3/8
Rule p1: 0 <= P(E ) <= 1
Rule p2: P(E ) = 1-P(E )
When E ("complement")is the event that E does not occur. Note that E is the opposite of E.
P(at least 1 boy)
= 1-P(at least 1 boy)
= 1- 1/8
Definition Two events are mutually exclusive if they cannot both occur at the same time.
Definition: AUB ("A union B") is the event that A occurs or B occurs (or both)
Rule P3 If A and B are mutually exclusivethen probability P(AUB)= P(A)+ P(B)
Table 1 Biological offspring
Parental Handedness Right Handed Left Handed Total
RightcRight (RR) 303 37 340
RightcLeft (RL) 29 9 38
LeftcRight (LR) 16 6 22
Total 348 52 400
If a person is selected at random from the above 400 people, find the probability that their parents
were RR or LR.
P(RR U LR)= P(RR) + P(LR)
= 340/400+ 22/400
Definition AnB ("A intersect B") is the event that A occurs and B occurs.
Rule P4: P(AUB)= P(A)+ P(B) -P(AB) = 10)= (20 choose 10)(1/5) (4/5) +(20 choose 11)(1/5) (4/5) +…(20 choose 20)(1/5) (4/5) 20 0
= 0.0026, Note: That is the probability of passing
Probability of failing, its compliment, is: 1-0.026=0.974