Study Guides
(238,353)

Canada
(115,095)

University of Toronto St. George
(7,978)

Psychology
(683)

PSY100H1
(375)

Connie Boudens
(1)

Midterm

# PSYB01 - Lec 9 - Inferential Stats and Hypothesis Testing (near-verbatim)

Unlock Document

University of Toronto St. George

Psychology

PSY100H1

Connie Boudens

Fall

Description

Inferential Statistics and Hypothesis testing: Nov 22, 2012 – Lec 9
Descriptive statistics summarize a data set – describe its characteristics
What the average and standard deviation for the data set is that you collected
But you want to go beyond the data that you collected
Descriptive stats are impt because you need 2 describe to ppl what exactly your data looked like but you
try to make conclusions about larger group using your sample
You want representative sample and ensure sample is sufficiently large so you can draw inferences
about the population that sample comes from
Polling organization are good at representative samples and can predict how population will behave and
draw some conclusion about the population by using the sample
Can you make conclusions about the population using the sample that you have? This isn’t as straight-
forward as it sounds – can’t go just go by the average – have to do more work
Inferential statistics are used to draw inferences about population -from a sample.
Two main methods used in inferential statistics:
1. estimation of population parameters – statistics @ the population level
– Sample used to estimate mean and SD for the population
• You’re never going to know exactly what the pop’n mean and SD are but sample can help
estimate what they are
– Confidence interval is constructed.
• CI around the mean usually
• CI allows you to say with a certain level of certainty that the population mean falls within
that relatively narrow range; ex) 99% certain that mean for population falls between 100-
105
– CI allows you to say that the population mean is within that interval.
– Bigger sample size = smaller CI
• This is good because more accurate; if CI = smaller, then estimate = more accurate
2. hypothesis testing
– Null Hypothesis
– Alternative Hypothesis / Research Hypothesis
– Ex) research question: is UTSC students smarter than general public?
• Hypothesize: yes, they are smarter than general public – with hypothesis testing, can
generate 2 competing hypotheses
• 1) null hypothesis – assumes that there are no differences – average IQ of UTSC students is
same as IQ for population
• 2) alternative/research hypothesis – there is a difference of IQ between UTSC students and
population
– In research paper, will usually mention their alternative/research hypothesis; not null H
– In hypothesis testing, always assume that your null hypothesis is true even though your alternative
hypothesis is true; researcher must present enough evidence to reject null hypothesis
• Have to get sample of UTSC students (n=50), test them and mean IQ = 121 – but does this
apply to the general UTSC population? Inferential stats must be done Why would my sample differ from the population?
If your sample is perfectly representative of the UTSC population then you have no problem – but this
will never be the case – there will always be some variation
2 basic reasons why sample may differ from population
Two sources of deviation:
• Systematic error
– Due to bias in your sample
– If you’ve done a good job you’ve eliminated these
– Some potential problems: you asked for volunteers to participate in study and they were told
they were going to do IQ test, usually the ppl who will participate are those with high IQs; smart
ppl will do IQ tests because it makes them feel good about themselves; or person who scores
marks them really high because they know it’s about IQ; etc; - these are sources of systematic
error – you can eliminate these errors
• Sampling error
– Samples will vary from each other in random fashion
- Unavoidable
- Any sample chosen will vary from population in some way and they will all be diff
from each other
- Not all going to have the same mean or same SD
- You won’t know how the samples would vary because you’re only taking one
sample; would cost lots of resources to do every possible sample and you have good
methods for extrapolating to the population so you would only use one sample
- What you want to know is if the sample is representative of the overall population
- If you’ve eliminated your sources of systematic error and accounted for other
potential research problems (accounting for confounding variables/extraneous,
getting good sample) – this all goes towards systematic error
- But you’re always going to be left with sampling error (unavoidable)
You need to figure out if the mean in your sample is due to sampling error or a real difference between UTSC
students and general public because it could be the case that it’s sampling error
• Let’s say that the sample that you initially pick is an unusal sample with IQ 125 (for UTSC students) but the
samples that you pick after that (IQ = 100, 98, 105 etc;) – if that happens – problematic – can do bigger
sample OR can use inferential stats
• So, with inferential stats, can use this one sample and see if it is representative of total UTSC pop’n
• Last week:
o Bell shaped histogram –normally distributed scores /bell distribution
o A lot of statistical techniques based on assumption that data is distributed like this in the pop’n
o Series of histograms with larger and larger sample sizes; a curve that’s drawn over this type of
histogram
o Ex) height = normally distributed in the pop’n
o Histogram A = 100 women categorized into height types; height of bar represents # of ppl in
that category in that division
o Histogram = not that many women who are really short or really tall; a lot in the middle
o So, you want to know whether your sample is an accurate representation of the pop’n
Need distribution of sample means to figure this out = also a bell-shaped distribution
but special kind of distribution
It is the means of all random samples that you could’ve drawn from a population If n=50 – drawing every sample of 50 UTSC students that you could and then giving
them all IQ tests –would have thousands of samples and eventually will have all of the
samples you could possibly draw
So give all of these samples IQ tests – will wind up with a distribution where few
samples on either end and most of the samples will be somewhere in the middle;
sampling distribution = so all the possible sample means that you could possibly draw
are in this distribution – normal distribution
The sampling distribution is normal if one of the 2 conditions is true: 1) If the population
is normal or 2) if you have a relatively large sample size (n=30)
If one of these 2 things is true, if you were to plot all of these means of all samples – will
be normal distribution
Your analysis will be based on..
• The Distribution of Sample Means: the collection of sample means for all the possible random samples
of a particular size (n) that can be obtained from a population.
• It will be almost perfectly normal if either:
– the population is normal, or
– the n of the sample is large
• This distribution has a mean that is equal to the population mean
– The average of the means of the distribution of all of your samples is the same as the population
mean
• With UTSC students, have sampling distribution of the means which is made up of all possible samples
of 75 students – test all people – find mean for each sample – plot them on graph –
– You have actual distribution of IQ scores at UTSC but this is a distribution that isn’t known and
isn’t knowable
– Sample distribution of the means = theoretical – can’t go thru and get every sample of 75
students – will take extremely long time
So, you have mean of 100 with SD of 10 and you have a theoretical distribution (=sampling distribution of the
means)
• What do you do now?
• Can calculate SD of sampling distribution (theoretical dist of all possible samples) from the SD of the sample
– it allows you to …because you know sampling distribution is normal and you know sampling distribution is
normal because sample size is 75 which is large enough and you also know the SD of this sampling
distribution – it allows you to figure out where in this theoretical distribution does your sample actually
comes from
• RMB: this is a theoretical distribution of all samples of 75 that you can possibly draw but now you can figure
out what the chances are that your sample comes from low end, high end or somewhere in middle
o It tells you how likely it is that your sample actually comes from this theoretical normal
distribution
o Wrt normal distribution, we know how much of the area underneath the curve falls between
the markings and the markings that we have @ the bottom = SD
o So, this is a standard normal distribution so the 0 marking will be the mean and 34.1% will be
between the mean and one SD above the mean
o It’s impt to know what SD is because it helps to visualize what this distribution looks like
o Let’s say the estimated SD of this sampling distribution is

More
Less
Related notes for PSY100H1