Lecture 9 –Assumptions - summary page
-Test statistic = amountof error
- (this is not an actual equation. It is the general idea)
- larger amount of error, smaller it gets
- ie if we look at the difference of the means (8.4 - 8.3) and the amount of error is 2 it will be
0.1/2 = 0.05 will be the test statistic..
- Test statistics get converted into p values which tells you the likelihood of finding a relationship
by chance alone.
-p values less than 0.05 are considered to be statistically significant since it means
that an effect that size is likely to be found strictly by chance 5% of the time.
- P value – probability of finding an effect that does NOT exist
• Also known as significance value
Males Females P value
Delinquency (mean) 7.6 4.4 0.031
6.2 5.8 0.08
Males have a higher delinquency level than females, the difference according to the p value is
statistically significant. Only going to find a difference that large if it doesn’t actually exist
0.031% of the time. Want to be below 0.05%. Second one is not significant because it is greater
than 0.05. 8% chance it’ll be no affect. Consider males and females to have same levels of
delinquency. It’s either significant or statistical non-significance. Don’t say insignificant.
-Type I and Type II error
• Type 1: more liberal with the p values that you accept as being significant.
o Ie p value of 0.1 being significant. Means we’re more likely to find an effect.
o False positive
• Type 2: false negative. When you only accept smaller p values.
Two types of tests:
-‘parametric tests’- based on the normal distribution.
• Many tests assume that your (generally) dependent variable is normally distributed. If
it’s not, then you run non-parametric test.
- ‘non-parametric test’– does not assume normality.
-Assumptions of parametric tests 2
b. Homogeneity of variance (homoscedasticity)
c. Continuous data
d. Independence of cases
i. The scores for people in your sample should not be systematically related
to the scores of other people in your sample.
Has to do with sampling.
• Clustering is not okay, if data is not collected through random sasmpling, assumption of
independence is likely violated.
e. Lack of multicollinearity – multivariate only
i. can’t include two or more variables that basically measure the same thing
ii. DV = house size. IVs = income & amount invested. Hypothesis: people
with higher income and larger amounts invested have a larger house size.
But the people who have high income are likely to have high amount
invested. They aren’t really measuring anything different so you get rid of
a. This can refer to different things
i. Sometimes it means that the variable you are interested in has to be
normally distributed – e.g., t-tests
ii. Sometimes it means that the error terms are normally distributed – e.g.,