false

Study Guides
(248,269)

Canada
(121,449)

University of Waterloo
(5,717)

Sociology
(241)

SOC 280
(11)

Owen Gallupe
(1)

Midterm

Unlock Document

Sociology

SOC 280

Owen Gallupe

Fall

Description

LS 280 Midterm Readings Review
Chapter 5 – Exploring Assumptions
- Assumptions are critical because if they are broken, we stop being able to draw accurate
conclusions about reality
- Different statistical models assume different things
- Parametric Test – is based on normal distribution, requires data from one large catalogue of
distributions, assumptions NEED to be met in order for data to be considered parametric
- Most parametric tests have four assumptions that need to be met:
o Normally distributed data
Data should be normally distributed, whether it’s the sampling distribution, or
the errors in the model, or something else)
If it is not met, the logic behind the hypothesis is flawed
o Homogeneity of variance
Variances should be the same throughout the data
In correlational designs, this means that the variance of one variable should be
stable at all levels of the other variable
In designs where you test several different groups, each sample comes from
populations with the same variance
o Interval data
Intervals should be used
Ordinals can be faked as intervals if there is enough of them
o Independence
Differs depending on the test
Behaviour of one participant/sample does not influence the behaviour of the
other
In repeated measures design (where participants are measured in more than
one experimental condition), scores are expected in the experimental
conditions to be non-independent for a given participant, but behaviour
between different participants should be differing
- In many statistical tests, we assume that the sampling distribution is normally distributed
- Central limit theorem – if sample data are approximately normal then the sampling distribution
will be also -> therefore people tend to look at their sample data to see if they are normally
distributed
o In big samples, sample distribution tends to be normal anyways, regardless of the shape
of data collected
o As sample size gets bigger, more we can be confident that sampling distribution will be
normally distributed
- P-P plot – (probability-probability plot) checks to see if a distribution is normal
o Plots the cumulative probability of the variable against the cumulative probability of a
particular distribution o The data is ranked and sorted
o For each rank, the corresponding z-score is calculated -> this is the expected value that
the score should have in a normal distribution
o Score itself is converted into a z-score -> actual z-score and expected z-score are plotted
against each other
o If data are normally distributed then the actual z-score will be the same as the expected
z-score and a diagonal line will appear
Descriptive Statistics -> P-P plots
- Histograms are subjective and open to abuse
- To check that distribution of scores is normal, we need to look at the values of kurtosis and
skewness
o These values should be 0 in a normal distribution
o (+) values of skewness indicate a pile-up of scores on left of distribution (and vice versa)
o (+) values of kurtosis indicate a pointy and heavy-tailed distribution (and vice versa)
o Further value is away from 0, more likely that data is not normally distributed
- Converting scores to a z-score is useful because:
o You can compare skew and kurtosis in different samples that used differing measures
o Able to see how likely it is that values of skew and kurtosis are to occur
- To convert a score to a z-score, you subtract the mean of the distribution and divide by standard
deviation of distribution -> skew and kurtosis can be done in this way
- When sample sizes are big, significant values arise from even small deviations in normality
- If resulting score is greater (when you ignore minus sign) than 1.96, it is significant (p<0.05)
- Significance tests of skew and kurtosis should not be used in large samples (because they are
likely to be significant even when skew and kurtosis are not too different from normal)
- Split file function allows you to specify a grouping variable
- Descriptive statistics are always a good way of getting an instant picture of the distribution of
data
- Kolmogorov-Smirnov test / Shapiro-Wilk test – compare the scores in the sample to a normally
distributed set of scores with the same mean and standard deviation
o If test is non-significant (p>0.05), it tells us that the distribution of the sample is not
significantly different from a normal distribution
o If test is significant (p<0.05), then the distribution is different from a normal distribution
o They tell us in one easy procedure whether scores are normally distributed
o Limitations:
With large sample sizes it is very easy to get significant results from small
deviations from normality, so a significant test doesn’t necessarily tell us
whether the deviation from normality is enough to bias any statistical
procedures that we apply to data
- Q-Q plot – similar to P-P plot, but it plots quantiles of the data set ideat of every individual score
in data
o Quantiles – values that split a data set into equal portions
o Will have less plot points on it than a P-P plot o Plots values you would expect to get if the distribution were normal (expected) values
against the values actually seen in data set (observed values)
o Expected values will be a straight diagonal line, whereas observed values (the dots)
should fall exactly on the straight line
o If they don’t fall on the straight line, you have deviation from normality
- Shapiro-Wilk test does the same thing, but has more power to detect differences from normality
- These tests should always be interpreted in conjunction with histograms, P-P plots, Q-Q plots,
and values of kurtosis and skewness
- Homogeneity of variance – as you go through levels of one variable, the variance of the other
should not change
o Variances close together = homogeneity
- Levene’s Test – tests homogeneity
o Tests null hypothesis that the variances in different groups are equal
o An ANOVA test that is conducted on the deviation scores – the absolute differences
between each score and the mean of the group in which it came from
o Is significant a p<0.05 -> which can be concluded that the null hypothesis is incorrect
and that the variances are significantly different (homogeneity of variances has been
violated)
o If Levene’s Test is non-significant (p>0.05), then the variances are roughly equal and the
assumption stays put
- Variance ratio – ratio of variances between the group with the biggest variance and the group
with the smallest variance
- Levene’s Test is denoted by the letter F
- Homogeneity of variance is the assumption that the spread of scores is roughly equal in
different groups of cases, or more generally that the spread of scores is roughly equal at
different points on the predictor variable
- When comparing groups, this assumption can be tested with Levene’s test and the variance
ratio
- If Levene’s test is significant (p in SPSS table is less than 0.05) then the variances are significantly
different in different groups
o Otherwise, homogeneity of variance can be assumed
- The variance ratio is the largest group variance divided by the smallest. This value needs to be
smaller than the critical values
- In large samples Levene’s test can be significant even when group variances are not very
different. Therefore, it should be interpreted in conjunction with the variance ratio.
- Dealing with Outliers:
o Remove the case
Should only be done if you have a good enough reason to believe that this case
is not from the population that you intended to sample
o Transform the data
Apply transformations to data
o Change the score Change to the next highest score plus one
Convert back from a z-score
The mean plus two standard deviations
- Transformations are useful for correcting problems with normality and the assumption of
homogeneity of variance
o You do something to every score to correct for distributional problems, outliers or
unequal variances
o Not cheating because you do same thing to all of the scores
o Won’t change relationships between variables but it WILL change the differences
between different variables
- If you are looking at differences between variables you must apply the same transformation to
ALL variables
o Click Transformations under Levene’s Test to operate
- Types of Transformations:
o Log
Can’t get a log value of zero or negative numbers
Corrects for positive skew, unequal variances
o Square root
The square root of large values has more of an effect than taking the square
root of small values
Corrects for positive skew, unequal variances
Negative numbers don’t have a square root
o Reciprocal
Dividing 1 by each score, reduces impact of large scores
Reverse the scores: scores that were originally large in the data set become
small (close to zero) after the transformation
Scores that were originally small become big after transformation
You can avoid this by reversing the scores before the transformation
Corrects for positive skew, unequal variances
o Reverse Score
Subtract each score from the highest score obtained, or the highest score plus 1
(depending whether you want your lowest score to be 0 or 1)
Corrects for negative skews
- If a statistical model is still accurate even when its assumptions are broken, it is a robust test
- By transforming the data you change the hypothesis being tested (when using a log
transformation and comparing means you change from comparing means to comparing
geometric means
- In small samples is it tricky to determine normality on way or another
- Applying the wrong transformation is often worse than not applying one at all
- To do transformations:
o Transform -> Compute Variable
o Choose category of function, make a new variable, function within selected category - Robust Methods – two concepts:
o Trimmed mean – a mean based on distribution of scores after some percentage of
scores has been removed from each extreme of the distribution, produces accurate
results even when the distribution is not symmetrical, because by trimming the ends of
the distribution has been removed from each extreme of the distribution
o Bootstrap – we may not know shape of distribution, but normality in data allows to
infer that sampling distribution is normal
Lack of normality prevents us from knowing the shape of the sampling
distribution unless we have big samples
Bootstrapping helps this problem by estimating the properties of the sampling
distribution from the sample data
Sample data is treated as a population from which smaller samples (called
bootstrap samples) are taken (putting the data back before a new sample is
drawn)
Chapter 6 – Comparing Two Means
- Two different ways of collecting data:
o Between-group/independent design – expose different people to different
manipulations
o Repeated measures design – take a single group of people and expose them to different
experimental manipulations at different points in time
eliminate some extraneous variables and can give more sensitivity in data
error bars do not reflect ‘true’ error around the means for repeated measures
designs
- To correct for repeated-measures error bars:
o Compute command
o Calculate average anxiety for each participant, so click mean function
o Transform -> compute variable
- Grand mean – mean of all scores (regardless of which condition the score comes from)
o if we take means already calculated that represent the average score for each
participant, and take the average of those mean scores, we will have the mean of all
participants
- When error bars do not overlap, samples have not come from same population
o When we plot error bars for repeated measures data, it shows the extra sensitivity that
this design has: the differences between conditions appear to be significant, whereas
when different participants are used, there does not appear to be a significant
difference
- More often than not, manipulation of the independent variable involves having an experimental
condition and a control group
- Independent-means t-test – used when there are two experimental conditions and different
participants were assigned to each condition (2 groups, 2 conditions) - Dependent-means t-test – used when there are two experimental conditions and the same
participants took part in both conditions of the experiment (1 groups, 2 conditions)
- Both tests have a similar rationale
o Two samples of data are collected and sample means calculated
o If samples come from same population, then we expect their means to be roughly equal
o We compare the difference between the sample means that we collected to the
difference between the sample means that we would expect to obtain f there were no
effect (if the null hypothesis were true)
o When the standard error is large, large differences in sample means are more likely. If
difference between the samples collected is larger than what would expected based on
the standard error then you can assume:
There is no effect and sample means in population fluctuate a lot
The two samples come from different populations but are typical of their
respective parent population
- Most test statistics can be thought of as the ‘variance explained by the model’ divided by the
‘variance that the model can’t explain’
- Assumptions of t-test:
o Both independent t-test and dependent t-test are parametric based on normal
distribution
o Data are measured at the interval level
o Independent t-test (because it is used to test different groups of people, also assumes:
Variances in these populations are roughly equal (homogeneity of variance)
Scores are independent (because they come from different people
- Standard error is the standard deviation of the sampling distribution
- If population is normally distributed than so is the sampling distribution
o Samples containing more than 50 scores should be normally distributed
- Sampling distribution is that its standard deviation is equal to the standard deviation of the
population div

More
Less
Related notes for SOC 280

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.