# STAT 100 Study Guide - Final Guide: Simple Random Sample, Observational Error, Confidence IntervalPremium

5 pages114 viewsFall 2018

School

Simon Fraser UniversityDepartment

StatisticsCourse Code

STAT 100Professor

Marie LoughinStudy Guide

FinalThis

**preview**shows page 1. to view the full**5 pages of the document.**Individuals – objects described by a set of data (ie.

people, animals, things, names, months, time of year,

people in a study).

Variables – characteristics of the individuals. (ie. major,

gpa, average temp, gender, income, bacteria type).

Categorical variables – places individual into groups. (ie.

major, grades, job, gender).

Numerical variables – numerical values for operations to

make sense (ie. age, income).

Observational study – observes variables for individuals

without interfere with responses. Response variable –

measures outcome or results in study.

Sample survey – part of pop where info was collected

from (important in observational studies), surveys

groups of individuals by studying only some of its

members to represent an entire group.

Population – entire group of individuals where info was

collected from (ie. Canadian residents, ctv viewers).

Sample – part of pop where we collect info from to draw

conclusions about the whole pop (ie. participants in a

poll, households). - Sample surveys start with entire pop

and uses specific methods to choose a sample to

represent pop.

Census – sample survey that attempts to include entire

pop in sample (ie. US constitution doing one every 10

years). Sometimes impossible/incomplete so sample

survey instead. Cost a lot, not time efficient.

Biased sampling – systematically favours certain

outcomes.

Convenience sampling – selection of subjects based on

availability (non-representative) (ie. voluntary response).

Simple random sample – size of n consists of n

individuals from the pop chosen in a way that every set

of n individual has an equal chance in participating. (n =

required size of sample) (ie. online software, hat,

labelling).

Parameter (p) - # that describes a summary of pop (fixed

#), unknown actual value.

Statistic (^p) - # that describes a sample, used to

estimate unknown parameter.

• - 548 voted yes out of 1015. ^p = 548/1015 =

0.5399 = 54% voted yes\

• - Random sampling eliminates bias, but still can

be wrong b/c of the variability when randomly

choosing.

• - If take lots of random samples from same size

of pop there is less variability (more

predictable).

• - ^p has no bias as an estimator of p.

• - To reduce bias -> use random sampling

• - To reduce variability -> use larger sample

MOE – statistic or proportion used to see how

close estimates are, generally 50% or between

20-80%.

• - Translate sampling variability into a statement

of how much confidence we have in results of a

study.

• - MOE = 1/sqrt of n (ie. 1/sqrt 1000 =

1/31.6228 = 0.0316 = 3.16%)

• - Proportion of a pop has the same outcome

success as p.

95% confidence – if we took many samples

using the same method, 95% of the samples

should give a result within + or – 4% of the truth

of the pop. (95% confident the truth lies within

the MOE).

MOE for 95% confidence – 95% confidence with

4% MOE means statistic will be 4% of the real

pop value 95% of the time.

Confidence statement – MOE & confidence

statement (level of confidence = % of all

possible samples that satisfy MOE).

• - Ie. estimate + or – MOE approximate 95%

confidence interval (^p + or – 1/sqrt n)

• - 85% of people + or – 2.4% own a cellphone

=82.6 < p < 87.4

• - Ie. ^p = 57% with MOE = 1sqrt 2142 = 0.22 =

22% = 57 + or – 2.2 = 54.8 < p <59.2

Sampling Errors – cause by taking a sample, results are

different from results of census.

Non-sampling errors – not related to selecting a sample.

Random sampling errors – deviation between sample

statistic & pop parameter caused by selecting RS.

- MOE in confidence statement includes only random

sampling errors.

Stratified random sample – divide sampling frame into

group of individuals (strata), take simple random sample

from each stratum and combine to make a complete

sample.

• - Ie. households (strata) on each block arranged

in order of location and divided into groups

called clusters.

• - Take separate samples from each stratum ->

usually proportional to size of pop ->

guarantees the sample

###### You're Reading a Preview

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only

matches pop in relation to the strata.

Response variable – measured an

outcome/results of study.

Explanatory variable – we think explains or

causes changed to response variable (IV or DV).

Lurking variable – has important effect on

relationship between 22 variables but not one

of the previous types.

- Use randomized response method for sensitive

questions.

Completely randomized experimental design – all

subjects allocated at random among all treatments.

Matched pairs design – compares 2 treatments in pairs

chosen closely matched as possible.

Blocked design – group of subjects that are known

before experiment to be similar in some way as

expected to affect the treatment (controls effects of

confounding variables).

Data ethics: institutional review board, informed consent, confidentiality.

• - Use instruments to make measurements, units record the measurements.

• - We measure a property of a person or thing when we assign a value to represent the property.

• - The result of a measurement is a numerical variable that takes different values for people or things that differ in

whatever we are measuring.

• - A variable is a valid measure of a property if it is relevant or appropriate as a representation of that property.

• - A rate (fraction, proportion or percentage) at which something occurs is more valid measure than a simple

count of occurrences.

• - A measurement of a property has predictive validity if it can be used to predict success on tasks that are related

to the measured property.

• A measurement has bias if it systematically tends to overstate/understate the true value of the

measured property.

• A measure has random error if repeated measurements of the same individual give different results.

• If random error small, we say the measurement is reliable.

• Using averages to improve reliability, the average of several repeated measures of the same individual is

more reliable than a single measurement.

• - Look for inconsistencies, missing info, incorrect arithmetic, implausible numbers, too regular appearing

numbers or hidden agendas.

Valid - if measurement of a property is a good representation of that property. Percentage increase & decrease

Ie. first quiz scored 5/20. Second quiz scored 10/20. Verify the percentage increase from first to second quiz is

100%. On third quiz, student got 5 again.

- 100% increase means score has doubled.

Percentage change = amount of change/starting value x 100 or (x-y)/y x 100

Distribution – of a variable tells us what values it takes

and how often.

- Use pie & bar for categorical.

Categorical V – places individual into groups/categories.

Quantitative V – numerical values for arithmetic

operations.

Pie charts – whole divided into parts, not good for

comparisons. (categorical)

Bar graph – better to compare sizes (also side by side

type).

- Both show distribution of categorical variables.

Line graphs – quantitative measured at intervals over

time (time on x axis, variable on y axis, connect by a

line).

- Look for overall pattern, trends

(upward/downward movement over time),

deviations from the pattern, seasonal

variations.

Histogram – measures quantitative variables, look for

patterns, deviations and outliers.

Shape – distribution peak (single, symmetric).

Center – midpoint close to single peak.

Variability – ignoring outliers (%).

Relative frequency histogram – better for comparing 2

distributions.

Stemplot – for small data sets.

###### You're Reading a Preview

Unlock to view full version

Subscribers Only

#### Loved by over 2.2 million students

Over 90% improved by at least one letter grade.