Class Notes (835,750)
Canada (509,374)
Psychology (7,782)
PSYB07H3 (39)

Correlation Notes (From Prof)

7 Pages
Unlock Document

Konstantine Zakzanis

Correlation: Measuring the degree of interdependence between two variables This section is about different kinds of correlations. Correlations can be calculated when two sets of measurements are made on the same entities. If the sample consists of scores from subjects, then in order to get started, each subject's score on one condition has to be lined up with their score on the other condition. For example, we can look at the correlation between reading times for two kinds of sentence types. The data file has each person's scores side by side, in two columns. Note that the scores we are correlating can be single token scores, or they can be totals or means from a larger sample. This makes no difference to the way in which the correlation is calculated, but of course if the data points are means of multiple tokens then they are more likely to be normally distributed (by the Central Limit Theorem). They will also be good point estimators and thus the results of the correlations will be cleaner than if the correlations were based on only one token per person. In particular the correlation is likely to be higher because there is less item-to-item variation. Remember that the s.d. of a sampling distribution is smaller as n increases, so the numbers going into the correlations will be less variable if based on means of n rather than on n=1. As in anova, the notion of analyses by subjects vs. by items holds for correlations. In homework 3 I ask for correlations by subjects, using just the two columns of data with no other complications. But in that assignment you also created an items file, with 16 items. However, in that case the items were all different, so there is no way to run a correlations. There is no way to match up pairs of sentences that are somehow ‘the same’. In the subjects analysis, though, you could ask if participants show a correlation between how they responded to type 1 and type 2. 1. Eyeballing the data Run descriptives and look at the correlations in terms of the dispersion of scores in a graph. Each subject is represented by a point on the graph, which represents their score on Type 1 and Type 2. At this stage you are looking to see how spread-out the scores are from the trend line, which can slope either up or down. Be careful to look for extreme values which can ruin everything, as I will show in class. One outlier can really change the correlation a lot. 1. Calculate the covariance. Consider each data point on the scattergram. Note that each is really the intersection of two data points, one on the x axis for variable 1, and the other on the y axis for variable 2. Thus a single data point can be examined in terms of how far its x value is from the mean of the x values and how far its y value is from the mean of the y values. These are simply two deviations from two means. They are needed for the calcluation of the correlation coefficient. So if the mean of x is 25 and the mean of y is 29, then a point with values of (22,22) deviates from the x mean by 22 – 25 = - 3 and from the y mean by 22 -29 = - 7. Notice that you start with the score and subtract the mean from it; these two deviations are therefore negative. So we have two deviation scores per data point. The covariance is calculated by multiplying each of these deviations together (e.g. -3 x -7 = +21), adding up all the products for each subject (or item, depending on what you're analyzing), and then taking an average deviation by dividing by n-1 (which approaches n with large sample sizes). Notice that n is defined as the total number of subjects (or items). The two scores have to be from the same subjects (or items), so df is defined just on the number of different people or items. If there are 32 subjects, and 64 data points, then df = 32-1=31. Notice that correlation is inherently a within-subjects design. If you had data from two groups of subjects there would be no way to line them up in the data file; would you put Mary’s number next to Juan’s or next to Jacqueline’s? Neither – there is no sense to the question of whether the two groups of subjects are correlated. If two sets of scores are interdependent, such that one goes up along with the other, or one goes up while the other goes down, we would like to develop a statistic that is sensitive to this lockstep pattern. Of course we won't expect perfect correlations from samples, since there will also be error and other sources of variance, so this statistic has to be sensitive to the amount of variability ('error') in the experiment, and still be able to tell us if the correlation is significant to a given precision (e.g. p < .05 or better). There are two steps. First, as I said above, we caculate the covariance: the product of each x and y deviation summed and divided by n-1. I will expand on each part of this formula. First note that if there is a positive correlation, i.e. as x increases y increases, then these deviations will generally have the same sign and their total will be greater than zero. If there is a negative correlation, i.e. as x increases y decreases or vice versa, then one deviation will usually be positive and the other negative, so their total will be less than zero. But if there is no relationship between the variables, there will be no tendency; the two deviations will both be sometimes positive and sometimes negative. Half the time they will be positive (mulitplying two positive or two negative numbers) and the other half they will be negative (multiplying posxneg or negxpos) and the result will be zero. Note that on a graph, positive correlations have an upward slope toward the right, and negative correlations have a downward slope to the right. If slope =rise/run then positive slopes would be e.g. rise of 2 run of 1 = 2 for positive, but rise of 2 run of -1 (or rise of -2 and run of 1) for a slope of -2. The only problem with the covariance as a test statistic is that it is scale-sensitive. For example, using the same data you would get different covariances if you change from, say, centimetres to metres, or pounds to kilos, although the shape of the relationship would be the same no matter which scale you used. If we wanted to run a hypothesis test against tabled values of the covariance, to see if our observed covariance is bigger than some fairly improbable cutoff under the null hypothesis, then we would need a different set of tables for every combination of variables, and that is impossible because no statistician would work out an infinite number of tables. Luckily we note that the product of the s.d.s will come out in the same units as the products of the deviations used in the covariance. If you divide the covariance by this product you eliminate the units and get just a proportion. This proportion comes out to the same thing regardless of which of the equivalent units you used. In fact you could use weight in kilos and height in inches or angstroms or astronomical units or lightyears and still get exactly the same r! The bigger or smaller the units, the bigger or smaller the divisor. R. otherwise known as the Pearson Product-Moment Correlation (or Pearson’s R) can range between -
More Less

Related notes for PSYB07H3

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.