KIN 1G03: FINAL REVIEW
Theme 1: Correlation and Causation
 How do we relate two variables?
 How are their central tendencies and dispersion related?

Pearson r
A statistic that represents the extent to which the same individual or event occupies the same relative
position on two variables;
 The Pearson r statistic ranges from 1.00 to 1.00
 In a perfect negative correlation, r = 1.00
 In a perfectly positive correlation , r = +1.00
 There is no apparent relationship, r = 0.00
R provides some indication of the degree of relationship between the two variables.
`Degrees of relationship`
To calculate this level of reliability (statistical significance)
1. Determine df: df = npairs – 2 (df compensates for small n by requiring a large absolute r)
2. Formulate hypothesis (H0= no relationship, H1 = true reliable relationship)
 Value of sig tells you how probable is it that the value you got is due to chance
Theme 2: Bivariate Regression
 One useful function of correlational analysis is that they can be used to predict the outcome of a
variable
 If the existing correlation is reasonably high we can predict what the value of y will be if we
know x
 1. Calculate r, r , k
 2. Regression line: two important characteristics:
o 1. Intercept (a), slope (b) (change in Y resulting from change in one unit of x)
o Regression line = Line of best fit
2 2
 Slope (b) = by= [sumXY – (sumx)(sumy)/N ] [sumX – (sumx) /N]
 a = meanY – bmeanX
 Y’ (predictive value) = a + bx
 Deviation around the regression line (residual) Is much smaller than deviation around the mean
 To draw the line, use (x=0) and (x=anotherscore), then connect the dots
 Our predication is only as good as the relationship between the two variables  If r = 0, best guess would be the mean
 If there is no relationship it doesn’t matter how much you know about one variable seeing as it
has no influence on the other
Summary:
 Regressio nrepsents two things:
 1) attemps to predict Y from X (DV from IV)
 2) A statement of how confident we are in this prediction
2
 If k = 0, we are totally confident
Standard Error of the Estimate
 Recall that: Standard deviation s = sqrt [sum(YYmean) /N] 2
 The equivalent for regression is the standard error of the estimate
 Estimates variability about the regression line
 Sesty = sqrt [sum(YY’) /N]
 Table:
Y (Y Observed) Y’ (Y Predicted) (YY’) (YY’)2
 Sesty = variation around regression line?
 OR: Sesty = s (yqrt (1r ))2
 Sy = sqrt (SSy (sum of squares of y) / n)
 Thus, because ew know that 68% of the time the true score will fall between z score of 1.00 and
1.00, we are now 68% certain that june got 91.blahblah +/ Sesty score… (6.67)
 What are the chances she got a diff mark?
 Y’ = 91.38 (predicted score)
 Sesty= 6.67 (our margin)
 Z 83(Y – Y’ ) / Sesty = (83 – 91.38) / 6.67 = 1.257
 To change confidence to LOC 95%, multiply 6.67 by 1.96 AND +/ the predicted score by that #
Testing For Significant Differences
One convenient an dpowerful method to test the relative difference between two means is the ttest
Student’s t test
Allow sus to test the likelihood that two means are “equal”
Tests the statistuical diff btwn two means against ht ebackground of withingroup variabuility
Difference can be affected
It does this calculating a t value (t obtained)
This value is compared to a critical t value (t critical)
T critical represents the point One tailed T – test
 Two variables, X1 and X2, find the mean of each
 In this caes ew have two independent groups.. eg old style vs new style
 T test compares the mean of these groups to determine if they are diff
 Df In a correlation = (n1) + (n1)
 Look at overlap of one particular tail
 T = meanx1 – meanx2 / sqrt (summed standard error of means squared of each variable added
together)
 Compare value to t table values in a standard table
Non Parametric Stats
MultiCategory Chi Sq
More
Less