Correlation and Regression

The Scatter Diagram

Univariate distribution: involve only one variable for each individual under study

Bivariate distribution: involves two scores for each individual

Scatter diagram: is a picture of the relationship between two variables

•Bivariate distribution of scores

•Each point shows were a particular individual scored on both x and y

•Each point represents the performance of one person who has been assessed on two measures

•Useful when the relationships between x and y are not described by a straight line

Correlation

In correlational analysis, we ask whether two variables covey and is designed mainly to examine linear

relationships between variables

Correlation coefficient: describes the direction and magnitude of a relationship

•Positive correlation: high scores on y are associated with high scores on x, low scores on y are

associated with low scores on x

•Negative correlation: higher scores on y are associated with lower scores on x, lower scores on y

are associated with higher scores on x

•No correlation: variables are not related

•Calculation of the correlation coefficient involve pairs of observations

For each observation on one variable, there is an observation on one other variable for the same

person

www.notesolution.com

Regression

Regression: is used to make predictions about scores on one variable from knowledge of scores on another

variable

The Regression Line

Regression line: the best-fitting straight line through a set of points in a scatter diagram

•Found by using the principle of least squares, which minimizes the squared deviation around the

regression line

•The mean is the point of least squares for any single variable, so the sum of the squared deviations

around the mean will be less than it is around any value other than the mean

•The line can be referred to as the running mean in two dimensions or in the space created by two

variables

•The least square method in regression finds the straight line that comes as close to as many of the

y means

•Thus, regression line = the line for which the squared deviations around the line are at a minimum

Regression coefficient (b): is the slope of the regression line

•Can be expressed as the ratio of the sum of squares for the covariance to the sum of squares for x

•Sum of squares: the sum of squared deviations around the mean

For x = the sum of the squared deviations around the x variable

•Covariance: expresses how much two measures vary together

Slope: how much change is expected in y each time increases by one unit

•Regression coefficient is sometimes expressed in different notations

•Beta is used for a population estimate of the regression coefficient

www.notesolution.com

Intercept: is the value of y when x is 0, or the point at which the regression line crosses the y axis

The Best-Fitting Line

The actual and predicted scores on y are rarely the same

•A person actually received a score of 4 on y and the regression equation predicted the person

received 4.3

The difference between the observed and predicted score (y-y’) is called the residual

•The best-fitting line keeps the residuals to a minimum by minimizing the deviation between

observed and predicted scores

•Since residuals can be (+) or (-) and will cancel to zero if averaged, each residual is squared

The best-fitting line is obtained by keeping the squared residuals as small as possible, which is known as

the principle of least squares

Correlation is a special case of regression in which the scores of both variables are standardized, or in z

units

•Having the scores in z units eliminates the need to find the intercept

•In correlation, the intercept is always zero

•The standardized unit allows ease in interpreting the slope in correlation

•When calculating the correlation coefficient, we can avoid the step of changing all the scores in z

units

Pearson product moment correlation: coefficient is a ratio used to determine the degree of variation in one

variable that can be estimated from knowledge about variation in the other variable

The correlation coefficient can take on any value from -0.1 to 1.0

www.notesolution.com

## Document Summary

Univariate distribution: involve only one variable for each individual under study. Bivariate distribution: involves two scores for each individual. In correlational analysis, we ask whether two variables covey and is designed mainly to examine linear relationships between variables. Correlation coefficient: describes the direction and magnitude of a relationship. for each observation on one variable, there is an observation on one other variable for the same person www. notesolution. com. Regression: is used to make predictions about scores on one variable from knowledge of scores on another variable. Regression line: the best-fitting straight line through a set of points in a scatter diagram. Regression coefficient (b): is the slope of the regression line: can be expressed as the ratio of the sum of squares for the covariance to the sum of squares for x. Sum of squares: the sum of squared deviations around the mean.