STAT 100 Study Guide - Final Guide: Simple Random Sample, Observational Error, Confidence IntervalPremium
5 pages94 viewsFall 2018
SchoolSimon Fraser University
Course CodeSTAT 100
This preview shows page 1. to view the full 5 pages of the document.
Individuals – objects described by a set of data (ie.
people, animals, things, names, months, time of year,
people in a study).
Variables – characteristics of the individuals. (ie. major,
gpa, average temp, gender, income, bacteria type).
Categorical variables – places individual into groups. (ie.
major, grades, job, gender).
Numerical variables – numerical values for operations to
make sense (ie. age, income).
Observational study – observes variables for individuals
without interfere with responses. Response variable –
measures outcome or results in study.
Sample survey – part of pop where info was collected
from (important in observational studies), surveys
groups of individuals by studying only some of its
members to represent an entire group.
Population – entire group of individuals where info was
collected from (ie. Canadian residents, ctv viewers).
Sample – part of pop where we collect info from to draw
conclusions about the whole pop (ie. participants in a
poll, households). - Sample surveys start with entire pop
and uses specific methods to choose a sample to
Census – sample survey that attempts to include entire
pop in sample (ie. US constitution doing one every 10
years). Sometimes impossible/incomplete so sample
survey instead. Cost a lot, not time efficient.
Biased sampling – systematically favours certain
Convenience sampling – selection of subjects based on
availability (non-representative) (ie. voluntary response).
Simple random sample – size of n consists of n
individuals from the pop chosen in a way that every set
of n individual has an equal chance in participating. (n =
required size of sample) (ie. online software, hat,
Parameter (p) - # that describes a summary of pop (fixed
#), unknown actual value.
Statistic (^p) - # that describes a sample, used to
estimate unknown parameter.
• - 548 voted yes out of 1015. ^p = 548/1015 =
0.5399 = 54% voted yes\
• - Random sampling eliminates bias, but still can
be wrong b/c of the variability when randomly
• - If take lots of random samples from same size
of pop there is less variability (more
• - ^p has no bias as an estimator of p.
• - To reduce bias -> use random sampling
• - To reduce variability -> use larger sample
MOE – statistic or proportion used to see how
close estimates are, generally 50% or between
• - Translate sampling variability into a statement
of how much confidence we have in results of a
• - MOE = 1/sqrt of n (ie. 1/sqrt 1000 =
1/31.6228 = 0.0316 = 3.16%)
• - Proportion of a pop has the same outcome
success as p.
95% confidence – if we took many samples
using the same method, 95% of the samples
should give a result within + or – 4% of the truth
of the pop. (95% confident the truth lies within
MOE for 95% confidence – 95% confidence with
4% MOE means statistic will be 4% of the real
pop value 95% of the time.
Confidence statement – MOE & confidence
statement (level of confidence = % of all
possible samples that satisfy MOE).
• - Ie. estimate + or – MOE approximate 95%
confidence interval (^p + or – 1/sqrt n)
• - 85% of people + or – 2.4% own a cellphone
=82.6 < p < 87.4
• - Ie. ^p = 57% with MOE = 1sqrt 2142 = 0.22 =
22% = 57 + or – 2.2 = 54.8 < p <59.2
Sampling Errors – cause by taking a sample, results are
different from results of census.
Non-sampling errors – not related to selecting a sample.
Random sampling errors – deviation between sample
statistic & pop parameter caused by selecting RS.
- MOE in confidence statement includes only random
Stratified random sample – divide sampling frame into
group of individuals (strata), take simple random sample
from each stratum and combine to make a complete
• - Ie. households (strata) on each block arranged
in order of location and divided into groups
• - Take separate samples from each stratum ->
usually proportional to size of pop ->
guarantees the sample
You're Reading a Preview
Unlock to view full version
Only half of the first page are available for preview. Some parts have been intentionally blurred.
matches pop in relation to the strata.
Response variable – measured an
outcome/results of study.
Explanatory variable – we think explains or
causes changed to response variable (IV or DV).
Lurking variable – has important effect on
relationship between 22 variables but not one
of the previous types.
- Use randomized response method for sensitive
Completely randomized experimental design – all
subjects allocated at random among all treatments.
Matched pairs design – compares 2 treatments in pairs
chosen closely matched as possible.
Blocked design – group of subjects that are known
before experiment to be similar in some way as
expected to affect the treatment (controls effects of
Data ethics: institutional review board, informed consent, confidentiality.
• - Use instruments to make measurements, units record the measurements.
• - We measure a property of a person or thing when we assign a value to represent the property.
• - The result of a measurement is a numerical variable that takes different values for people or things that differ in
whatever we are measuring.
• - A variable is a valid measure of a property if it is relevant or appropriate as a representation of that property.
• - A rate (fraction, proportion or percentage) at which something occurs is more valid measure than a simple
count of occurrences.
• - A measurement of a property has predictive validity if it can be used to predict success on tasks that are related
to the measured property.
• A measurement has bias if it systematically tends to overstate/understate the true value of the
• A measure has random error if repeated measurements of the same individual give different results.
• If random error small, we say the measurement is reliable.
• Using averages to improve reliability, the average of several repeated measures of the same individual is
more reliable than a single measurement.
• - Look for inconsistencies, missing info, incorrect arithmetic, implausible numbers, too regular appearing
numbers or hidden agendas.
Valid - if measurement of a property is a good representation of that property. Percentage increase & decrease
Ie. first quiz scored 5/20. Second quiz scored 10/20. Verify the percentage increase from first to second quiz is
100%. On third quiz, student got 5 again.
- 100% increase means score has doubled.
Percentage change = amount of change/starting value x 100 or (x-y)/y x 100
Distribution – of a variable tells us what values it takes
and how often.
- Use pie & bar for categorical.
Categorical V – places individual into groups/categories.
Quantitative V – numerical values for arithmetic
Pie charts – whole divided into parts, not good for
Bar graph – better to compare sizes (also side by side
- Both show distribution of categorical variables.
Line graphs – quantitative measured at intervals over
time (time on x axis, variable on y axis, connect by a
- Look for overall pattern, trends
(upward/downward movement over time),
deviations from the pattern, seasonal
Histogram – measures quantitative variables, look for
patterns, deviations and outliers.
Shape – distribution peak (single, symmetric).
Center – midpoint close to single peak.
Variability – ignoring outliers (%).
Relative frequency histogram – better for comparing 2
Stemplot – for small data sets.
You're Reading a Preview
Unlock to view full version
Loved by over 2.2 million students
Over 90% improved by at least one letter grade.