STATS 2B03 Lecture Notes - Central Limit Theorem, Treatment And Control Groups, Pooled Variance

6 views25 pages
May 1st
Statistics, 13th edition James T McClave & Terry Sincich Chapter 1-9
Jose Correa
Mon/Tue/Wed/Thru. 11:05~13:25
May 16 Midterm in class 50 minutes.
4 assignments.
Final 3 hours place is to be announced by summer studies. All the materials
are covered.
15% coursework +25% midterm +60% final or 15% coursework + 85% final
Hand in assignment IN TIME on “myCouses”, late assignment would be 0.
What is Statistics?
-Science of data (analyzing data)
-Often presented as numerical description
-Efficiency rate of an NHL goalkeeper
-Percentage of unemployed people in Canada
-However, it is more than that - Statistical application
-Descriptive: numerical and graphical representation of data
-Inferential (more important): Estimates, decisions, predictions, etc. about
a “population” using data from a “sample”.
Population: A set of people, machines, trees, animals, transactions that we
want to study.
Sample: A subset of the population under study. (population is always bigger
than sample)
Elements for a good statistical analysis:
1) Objective(s) of the study: What question(s) do we want to answer?
2) Experimental units: People, transactions, machines, etc. (depends on
what you want to study). The “subjects” we want to study. It is possible
for one study to have multiple subjects.
3) Population under study
4) Characteristic(s) of interest: variables measured on the experimental
units. (Ex. Age of the babies; sex of the babies)
-Quantitative variables: numerical in nature. (Ex. Height, weight,
blood pressure, speed, counts)
-Qualitative variables: categorical in nature. (Ex. Religion, sex,
school grade, cancer vs. non-cancer)
5) Sample: a subset of the population under study.
Data collection: once we know the objectives, population, experimental
units, variables of interest, we can collect data.
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 25 pages and 3 million more documents.

Already have an account? Log in
Methods:
1) *Experimental studies
2) *Observational studies
3) Published data (Stats Canada)
4) Surveys
Experimental studies: researcher designs the experiment and has control
over the experimental units. (Ex. Study of the effect of a new treatment on
certain disease.)
We would have e.g. Treatment group and Control group.
Observational studies: sometimes it is unethical or not possible to
“assign” a treatment to a subject. Researcher can only observe (directly or
indirectly) the experimental units and records the variables of interest.
Descriptive statistics: utilizes numerical and graphical methods to look for
patterns in a data set, to summarize the information revealed in a data set,
and to present that information in a convenient form.
Inferential statistics: utilizes sample data to make estimates, decisions,
predictions or other generalizations about a larger set of data.
Statistical inference is an estimate, prediction, or some other
generalization about a population based on information contained in a
sample.
May 2nd
-Method for summarizing data:
1) Graphical
2) Numerical
-Both methods work for qualitative or quantitative data.
-When a third variable (confounding factor) changes the interpretation of
the relationship between two other variables, the situation is called
Simpson’s paradox.
-Named after E. H. Simpson (1951)
-Key: always be suspicious of data summaries.
-R Studio
-Methods for quantitative data:
1) Centre
2) Spread
3) Shape
4) Weird things (e.g. outliers)
-Centre: sample mean (Let x1, x2, ……, xn be n numbers. The sample mean is
denoted by = (x1 + x2+ …… + xn)/n
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 25 pages and 3 million more documents.

Already have an account? Log in
- is also called sample average, notation = /n
-Example: n = 5, x1 = 3, x2 = 5, x3 = 5, x4 = 6, x5 = 8, =/5 =
(3+5+5+6+8)/5=5.41
-n = 4, x1 = 3, x2 = 1, x3 = 4, x4 = 100, = (3+1+4+100)/4 = 27
-n = 8, x1 = 132.3, x2 = 130.7, x3 = 134.1, x4 = 5.3, x5 = 133.9, x6 =
131.3, x7 = 132.0, x8 = 133.3, =
(132.3+130.7+134.1+5.3+133.9+131.3+132.0+133.3)/8 = 116.61
-Sample median: let x1, x2, x3, ……, xn be a set of numbers. The sample
median is denoted by “m”. It is the middle number, in the sense that if we
ordered the set from smallest to largest, m would be in the middle.
-If n is odd, m is the number in the ()th position in the ordered list.
-Example: 1,2,3,4,5. 3 is the median. = 3 (this 3 is the position in the
ordered list, not the number in the list), we look for the number in the 3rd
position.
-If n is even, we find the numbers in the positions () and (+1) in the
ordered list, m is the average of these two numbers.
-Example: 3,4,5,6,7,7. m = = 5.5. Position () = 6/2 = 3rd, (+1) = 4th, so m
is the average of the 3rd number and the 4th number, which are 5 and 6, so m =
5.5.
-50% of data points are below the median and 50% of data points are above
the median.
-The median is not affected by the extreme values.
-Example: n = 5, x1 = 1, x2 = 5, x3 = 2, x4 = 9, x5 = 8. Ordered data: 1, 2,
5, 8, 9, the median is 5. If we change x4 = 900, the ordered data would be
1, 2, 5, 8, 900, the median is still 5, it doesn’t change.
-Sample mode: the most frequently occurring observation of a data set.
-If all observations occur exactly once, then every observation is a mode.
- It is possible to have more than one mode.
-Example: 1, 5, 2, 6, 9, 8. Every data point is a mode.
1, 5, 2, 6, 2, 9, 8. Mode = 2.
8, 1, 5, 2, 6, 2, 9, 2, 8, 8. Mode = 2 & 8.
-Measure of spread:
-Sample range, denoted by R. Let x1, x2, x3, ……, xn be a set of numbers.
Let xL = the largest value of the numbers, xS = the smallest value of the
numbers. R = xL- xS.
-Sample variance, et x1, x2, ……, xn be a set of n numbers. The sample
variance, denoted s², s² = .
-The sample standard deviation, denoted by s, s = .
-S is often preferred to because it is in the same units of the
observations, while s² is in units².
-Sample variance (2nd formula): s² = (
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 25 pages and 3 million more documents.

Already have an account? Log in

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related textbook solutions

Related Documents