Study Guides (248,269)
Canada (121,449)
Sociology (241)
SOC 280 (11)
Midterm

LS 280 - Midterm 2 Readings Review.docx

12 Pages
179 Views
Unlock Document

Department
Sociology
Course
SOC 280
Professor
Owen Gallupe
Semester
Fall

Description
LS 280 Midterm Readings Review Chapter 5 – Exploring Assumptions - Assumptions are critical because if they are broken, we stop being able to draw accurate conclusions about reality - Different statistical models assume different things - Parametric Test – is based on normal distribution, requires data from one large catalogue of distributions, assumptions NEED to be met in order for data to be considered parametric - Most parametric tests have four assumptions that need to be met: o Normally distributed data  Data should be normally distributed, whether it’s the sampling distribution, or the errors in the model, or something else)  If it is not met, the logic behind the hypothesis is flawed o Homogeneity of variance  Variances should be the same throughout the data  In correlational designs, this means that the variance of one variable should be stable at all levels of the other variable  In designs where you test several different groups, each sample comes from populations with the same variance o Interval data  Intervals should be used  Ordinals can be faked as intervals if there is enough of them o Independence  Differs depending on the test  Behaviour of one participant/sample does not influence the behaviour of the other  In repeated measures design (where participants are measured in more than one experimental condition), scores are expected in the experimental conditions to be non-independent for a given participant, but behaviour between different participants should be differing - In many statistical tests, we assume that the sampling distribution is normally distributed - Central limit theorem – if sample data are approximately normal then the sampling distribution will be also -> therefore people tend to look at their sample data to see if they are normally distributed o In big samples, sample distribution tends to be normal anyways, regardless of the shape of data collected o As sample size gets bigger, more we can be confident that sampling distribution will be normally distributed - P-P plot – (probability-probability plot) checks to see if a distribution is normal o Plots the cumulative probability of the variable against the cumulative probability of a particular distribution o The data is ranked and sorted o For each rank, the corresponding z-score is calculated -> this is the expected value that the score should have in a normal distribution o Score itself is converted into a z-score -> actual z-score and expected z-score are plotted against each other o If data are normally distributed then the actual z-score will be the same as the expected z-score and a diagonal line will appear  Descriptive Statistics -> P-P plots - Histograms are subjective and open to abuse - To check that distribution of scores is normal, we need to look at the values of kurtosis and skewness o These values should be 0 in a normal distribution o (+) values of skewness indicate a pile-up of scores on left of distribution (and vice versa) o (+) values of kurtosis indicate a pointy and heavy-tailed distribution (and vice versa) o Further value is away from 0, more likely that data is not normally distributed - Converting scores to a z-score is useful because: o You can compare skew and kurtosis in different samples that used differing measures o Able to see how likely it is that values of skew and kurtosis are to occur - To convert a score to a z-score, you subtract the mean of the distribution and divide by standard deviation of distribution -> skew and kurtosis can be done in this way - When sample sizes are big, significant values arise from even small deviations in normality - If resulting score is greater (when you ignore minus sign) than 1.96, it is significant (p<0.05) - Significance tests of skew and kurtosis should not be used in large samples (because they are likely to be significant even when skew and kurtosis are not too different from normal) - Split file function allows you to specify a grouping variable - Descriptive statistics are always a good way of getting an instant picture of the distribution of data - Kolmogorov-Smirnov test / Shapiro-Wilk test – compare the scores in the sample to a normally distributed set of scores with the same mean and standard deviation o If test is non-significant (p>0.05), it tells us that the distribution of the sample is not significantly different from a normal distribution o If test is significant (p<0.05), then the distribution is different from a normal distribution o They tell us in one easy procedure whether scores are normally distributed o Limitations:  With large sample sizes it is very easy to get significant results from small deviations from normality, so a significant test doesn’t necessarily tell us whether the deviation from normality is enough to bias any statistical procedures that we apply to data - Q-Q plot – similar to P-P plot, but it plots quantiles of the data set ideat of every individual score in data o Quantiles – values that split a data set into equal portions o Will have less plot points on it than a P-P plot o Plots values you would expect to get if the distribution were normal (expected) values against the values actually seen in data set (observed values) o Expected values will be a straight diagonal line, whereas observed values (the dots) should fall exactly on the straight line o If they don’t fall on the straight line, you have deviation from normality - Shapiro-Wilk test does the same thing, but has more power to detect differences from normality - These tests should always be interpreted in conjunction with histograms, P-P plots, Q-Q plots, and values of kurtosis and skewness - Homogeneity of variance – as you go through levels of one variable, the variance of the other should not change o Variances close together = homogeneity - Levene’s Test – tests homogeneity o Tests null hypothesis that the variances in different groups are equal o An ANOVA test that is conducted on the deviation scores – the absolute differences between each score and the mean of the group in which it came from o Is significant a p<0.05 -> which can be concluded that the null hypothesis is incorrect and that the variances are significantly different (homogeneity of variances has been violated) o If Levene’s Test is non-significant (p>0.05), then the variances are roughly equal and the assumption stays put - Variance ratio – ratio of variances between the group with the biggest variance and the group with the smallest variance - Levene’s Test is denoted by the letter F - Homogeneity of variance is the assumption that the spread of scores is roughly equal in different groups of cases, or more generally that the spread of scores is roughly equal at different points on the predictor variable - When comparing groups, this assumption can be tested with Levene’s test and the variance ratio - If Levene’s test is significant (p in SPSS table is less than 0.05) then the variances are significantly different in different groups o Otherwise, homogeneity of variance can be assumed - The variance ratio is the largest group variance divided by the smallest. This value needs to be smaller than the critical values - In large samples Levene’s test can be significant even when group variances are not very different. Therefore, it should be interpreted in conjunction with the variance ratio. - Dealing with Outliers: o Remove the case  Should only be done if you have a good enough reason to believe that this case is not from the population that you intended to sample o Transform the data  Apply transformations to data o Change the score  Change to the next highest score plus one  Convert back from a z-score  The mean plus two standard deviations - Transformations are useful for correcting problems with normality and the assumption of homogeneity of variance o You do something to every score to correct for distributional problems, outliers or unequal variances o Not cheating because you do same thing to all of the scores o Won’t change relationships between variables but it WILL change the differences between different variables - If you are looking at differences between variables you must apply the same transformation to ALL variables o Click Transformations under Levene’s Test to operate - Types of Transformations: o Log  Can’t get a log value of zero or negative numbers  Corrects for positive skew, unequal variances o Square root  The square root of large values has more of an effect than taking the square root of small values  Corrects for positive skew, unequal variances  Negative numbers don’t have a square root o Reciprocal  Dividing 1 by each score, reduces impact of large scores  Reverse the scores: scores that were originally large in the data set become small (close to zero) after the transformation  Scores that were originally small become big after transformation  You can avoid this by reversing the scores before the transformation  Corrects for positive skew, unequal variances o Reverse Score  Subtract each score from the highest score obtained, or the highest score plus 1 (depending whether you want your lowest score to be 0 or 1)  Corrects for negative skews - If a statistical model is still accurate even when its assumptions are broken, it is a robust test - By transforming the data you change the hypothesis being tested (when using a log transformation and comparing means you change from comparing means to comparing geometric means - In small samples is it tricky to determine normality on way or another - Applying the wrong transformation is often worse than not applying one at all - To do transformations: o Transform -> Compute Variable o Choose category of function, make a new variable, function within selected category - Robust Methods – two concepts: o Trimmed mean – a mean based on distribution of scores after some percentage of scores has been removed from each extreme of the distribution, produces accurate results even when the distribution is not symmetrical, because by trimming the ends of the distribution has been removed from each extreme of the distribution o Bootstrap – we may not know shape of distribution, but normality in data allows to infer that sampling distribution is normal  Lack of normality prevents us from knowing the shape of the sampling distribution unless we have big samples  Bootstrapping helps this problem by estimating the properties of the sampling distribution from the sample data  Sample data is treated as a population from which smaller samples (called bootstrap samples) are taken (putting the data back before a new sample is drawn) Chapter 6 – Comparing Two Means - Two different ways of collecting data: o Between-group/independent design – expose different people to different manipulations o Repeated measures design – take a single group of people and expose them to different experimental manipulations at different points in time  eliminate some extraneous variables and can give more sensitivity in data  error bars do not reflect ‘true’ error around the means for repeated measures designs - To correct for repeated-measures error bars: o Compute command o Calculate average anxiety for each participant, so click mean function o Transform -> compute variable - Grand mean – mean of all scores (regardless of which condition the score comes from) o if we take means already calculated that represent the average score for each participant, and take the average of those mean scores, we will have the mean of all participants - When error bars do not overlap, samples have not come from same population o When we plot error bars for repeated measures data, it shows the extra sensitivity that this design has: the differences between conditions appear to be significant, whereas when different participants are used, there does not appear to be a significant difference - More often than not, manipulation of the independent variable involves having an experimental condition and a control group - Independent-means t-test – used when there are two experimental conditions and different participants were assigned to each condition (2 groups, 2 conditions) - Dependent-means t-test – used when there are two experimental conditions and the same participants took part in both conditions of the experiment (1 groups, 2 conditions) - Both tests have a similar rationale o Two samples of data are collected and sample means calculated o If samples come from same population, then we expect their means to be roughly equal o We compare the difference between the sample means that we collected to the difference between the sample means that we would expect to obtain f there were no effect (if the null hypothesis were true) o When the standard error is large, large differences in sample means are more likely. If difference between the samples collected is larger than what would expected based on the standard error then you can assume:  There is no effect and sample means in population fluctuate a lot  The two samples come from different populations but are typical of their respective parent population - Most test statistics can be thought of as the ‘variance explained by the model’ divided by the ‘variance that the model can’t explain’ - Assumptions of t-test: o Both independent t-test and dependent t-test are parametric based on normal distribution o Data are measured at the interval level o Independent t-test (because it is used to test different groups of people, also assumes:  Variances in these populations are roughly equal (homogeneity of variance)  Scores are independent (because they come from different people - Standard error is the standard deviation of the sampling distribution - If population is normally distributed than so is the sampling distribution o Samples containing more than 50 scores should be normally distributed - Sampling distribution is that its standard deviation is equal to the standard deviation of the population div
More Less

Related notes for SOC 280

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit