Study Guides (290,534)
CA (139,006)
SFU (4,412)
STAT (90)
Final

STAT 100 Study Guide - Final Guide: Simple Random Sample, Observational Error, Confidence IntervalPremium

5 pages94 viewsFall 2018

Department
Statistics
Course Code
STAT 100
Professor
Marie Loughin
Study Guide
Final

This preview shows page 1. to view the full 5 pages of the document.
Individuals objects described by a set of data (ie.
people, animals, things, names, months, time of year,
people in a study).
Variables characteristics of the individuals. (ie. major,
gpa, average temp, gender, income, bacteria type).
Categorical variables places individual into groups. (ie.
major, grades, job, gender).
Numerical variables numerical values for operations to
make sense (ie. age, income).
Observational study observes variables for individuals
without interfere with responses. Response variable
measures outcome or results in study.
Sample survey part of pop where info was collected
from (important in observational studies), surveys
groups of individuals by studying only some of its
members to represent an entire group.
Population entire group of individuals where info was
collected from (ie. Canadian residents, ctv viewers).
Sample part of pop where we collect info from to draw
conclusions about the whole pop (ie. participants in a
poll, households). - Sample surveys start with entire pop
and uses specific methods to choose a sample to
represent pop.
Census sample survey that attempts to include entire
pop in sample (ie. US constitution doing one every 10
years). Sometimes impossible/incomplete so sample
survey instead. Cost a lot, not time efficient.
Biased sampling systematically favours certain
outcomes.
Convenience sampling selection of subjects based on
availability (non-representative) (ie. voluntary response).
Simple random sample size of n consists of n
individuals from the pop chosen in a way that every set
of n individual has an equal chance in participating. (n =
required size of sample) (ie. online software, hat,
labelling).
Parameter (p) - # that describes a summary of pop (fixed
#), unknown actual value.
Statistic (^p) - # that describes a sample, used to
estimate unknown parameter.
- 548 voted yes out of 1015. ^p = 548/1015 =
0.5399 = 54% voted yes\
- Random sampling eliminates bias, but still can
be wrong b/c of the variability when randomly
choosing.
- If take lots of random samples from same size
of pop there is less variability (more
predictable).
- ^p has no bias as an estimator of p.
- To reduce bias -> use random sampling
- To reduce variability -> use larger sample
MOE statistic or proportion used to see how
close estimates are, generally 50% or between
20-80%.
- Translate sampling variability into a statement
of how much confidence we have in results of a
study.
- MOE = 1/sqrt of n (ie. 1/sqrt 1000 =
1/31.6228 = 0.0316 = 3.16%)
- Proportion of a pop has the same outcome
success as p.
95% confidence if we took many samples
using the same method, 95% of the samples
should give a result within + or 4% of the truth
of the pop. (95% confident the truth lies within
the MOE).
MOE for 95% confidence 95% confidence with
4% MOE means statistic will be 4% of the real
pop value 95% of the time.
Confidence statement MOE & confidence
statement (level of confidence = % of all
possible samples that satisfy MOE).
- Ie. estimate + or MOE approximate 95%
confidence interval (^p + or 1/sqrt n)
- 85% of people + or 2.4% own a cellphone
=82.6 < p < 87.4
- Ie. ^p = 57% with MOE = 1sqrt 2142 = 0.22 =
22% = 57 + or 2.2 = 54.8 < p <59.2
Sampling Errors cause by taking a sample, results are
different from results of census.
Non-sampling errors not related to selecting a sample.
Random sampling errors deviation between sample
statistic & pop parameter caused by selecting RS.
- MOE in confidence statement includes only random
sampling errors.
Stratified random sample divide sampling frame into
group of individuals (strata), take simple random sample
from each stratum and combine to make a complete
sample.
- Ie. households (strata) on each block arranged
in order of location and divided into groups
called clusters.
- Take separate samples from each stratum ->
usually proportional to size of pop ->
guarantees the sample
You're Reading a Preview

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only
matches pop in relation to the strata.
Response variable measured an
outcome/results of study.
Explanatory variable we think explains or
causes changed to response variable (IV or DV).
Lurking variable has important effect on
relationship between 22 variables but not one
of the previous types.
- Use randomized response method for sensitive
questions.
Completely randomized experimental design all
subjects allocated at random among all treatments.
Matched pairs design compares 2 treatments in pairs
chosen closely matched as possible.
Blocked design group of subjects that are known
before experiment to be similar in some way as
expected to affect the treatment (controls effects of
confounding variables).
Data ethics: institutional review board, informed consent, confidentiality.
- Use instruments to make measurements, units record the measurements.
- We measure a property of a person or thing when we assign a value to represent the property.
- The result of a measurement is a numerical variable that takes different values for people or things that differ in
whatever we are measuring.
- A variable is a valid measure of a property if it is relevant or appropriate as a representation of that property.
- A rate (fraction, proportion or percentage) at which something occurs is more valid measure than a simple
count of occurrences.
- A measurement of a property has predictive validity if it can be used to predict success on tasks that are related
to the measured property.
A measurement has bias if it systematically tends to overstate/understate the true value of the
measured property.
A measure has random error if repeated measurements of the same individual give different results.
If random error small, we say the measurement is reliable.
Using averages to improve reliability, the average of several repeated measures of the same individual is
more reliable than a single measurement.
- Look for inconsistencies, missing info, incorrect arithmetic, implausible numbers, too regular appearing
numbers or hidden agendas.
Valid - if measurement of a property is a good representation of that property. Percentage increase & decrease
Ie. first quiz scored 5/20. Second quiz scored 10/20. Verify the percentage increase from first to second quiz is
100%. On third quiz, student got 5 again.
- 100% increase means score has doubled.
Percentage change = amount of change/starting value x 100 or (x-y)/y x 100
Distribution of a variable tells us what values it takes
and how often.
- Use pie & bar for categorical.
Categorical V places individual into groups/categories.
Quantitative V numerical values for arithmetic
operations.
Pie charts whole divided into parts, not good for
comparisons. (categorical)
Bar graph better to compare sizes (also side by side
type).
- Both show distribution of categorical variables.
Line graphs quantitative measured at intervals over
time (time on x axis, variable on y axis, connect by a
line).
- Look for overall pattern, trends
(upward/downward movement over time),
deviations from the pattern, seasonal
variations.
Histogram measures quantitative variables, look for
patterns, deviations and outliers.
Shape distribution peak (single, symmetric).
Center midpoint close to single peak.
Variability ignoring outliers (%).
Relative frequency histogram better for comparing 2
distributions.
Stemplot for small data sets.
You're Reading a Preview

Unlock to view full version

Subscribers Only

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.