PSY248 Lecture Notes - Lecture 4: Error Bar, Multicollinearity, Linear Regression

91 views9 pages
PSY248: Week 4 multiple regression
what we have done so far is simple linear regression, where we fit a straight line to
describe the relationship between a predicted variable and an outcome variable
y = a +bX + e
It is possible to have multiple predictor variables (so multiple X’s)
Y = dependent variable
You will still go through the 3 stages: univariate, bivariate, perform regression
& check assumptions
Only difference is we have more than one predictor
Multiple regression for mental impairment
Outcome (DV) is a measure of Mental Impairment, general psychiatric
symptoms
Possible IVs are the two predictor variables:
o Life events score
o Socioeconomic status
We start to recognize that our variable of primary interest may not be the only
relevant IV
DV is a measure of general mental impairment (psychiatric symptoms
including depression and anxiety)
The two IVs used here are X1 = life events and X2 = socioeconomic status
Life events refers to score on a life events index, including both number of life
events and severity of events experienced in the past 3 years
Life events is our IV of primary interest. The research question is whether
more frequent (and severe) life events predicts higher mental impairment
Steps in doing this multiple regression
1. Recognise problem as a multiple regression
2. Remember RQ
3. Univariate data description graphical and numerical
4. Bivariate Graphical data description
5. Produce correlation matrix (Pearson’s r)
6. Fit full model (if appropriate to 2.)
7. Reduce Full Model (if appropriate)
8. Fit Final Model and Report
Steps 1-2:
Recognize problem as a multiple regression
o 1 numeric, DV, 2 numeric IVs
Consider theory: more frequent and severe life events should generate more
psychiatric symptoms
Write (draw) RQ: do life events and socioeconomic status together predict
mental impairment? If so, are both predictors required?
Y = mental impairment
X1 = life events
X2 = SES
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 9 pages and 3 million more documents.

Already have an account? Log in
Understand population and data
Understand sampling population: Florida adults
Understand unit of analysis: general community members
Check all IVs numeric yes
Ordinal variables checked before upgrading no ordinal variable
Consider variable-to-predictor/IV ratio rule of thumb =
o Look at the number of IVs and then you multiply that by two random
numbers
o N > 5*p is bare minimum
o N > 10*p more desirable
o Here we have a sample of 40 >> 10*2
3. Univariate data description
Produce graphical summaries (histogram, error bar plot)
Comment on distributions for BOTH IVs (central tendency, variability, skew,
kurtosis etc.)
Summarise with appropriate numerical values
Write global summary statement of what you have found
SPSS menu: graph legacy dialogs graph (tick display normal curve)
Descriptive statistics
o You can see the three means of the variables and the sample sizes (N)
o Listwise N = number of cases that have valid values for all variables in
the table
The three variables were approx. normally distributed. The dependent
variable, Mental Impairment, ranged from 17 to 41 (mean 27, SD, 5.5). For
the two predictors life events ranged from 3 to 97 and parents years of
education ranged from 3 to 96
4. Bivariate data description
Plot DV against each IV
Comment on scatterplots (7 points)
Consider outliers
Write global summary statement of what u have found
The scatterplots for mental impairment against life events and SES show a
positive and negative linear relationship respectively. Both relationships
appear low-moderate strength and only low correlation. The graphs show no
unusual characteristics although there may be one outlier for the relationship
between MI and SES
5. Produce correlation matrix
Consider colinearity, multicolinearity
Consider statistical significance of DV correlation with each IV
Consider correlations between the two IVs
Write summary statement on what you found
Collinearity (multicollinearity) occurs when two or more IVs are so correlated that
one can be predicted (almost) exactly from one or more of the others
Made possibly by a combination of
o Multiple IVs
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 9 pages and 3 million more documents.

Already have an account? Log in
o Non-orthogonality due to observational design
To occur requires strong correlations between IVs
Multicollinearity exists whenever an IV can be exactly/nearly calculated from
a linear combination of other IVs
Indicators
o Large correlations among IVs
o Large changes in coefficients and/or SE, when a new IV is added to
the model
Scatterplot of IVs to look at correlation/collinearity
Appearance of IVs confirms weak correlation (look at diagram), thus no
collinearity
This plot and the earlier correlation work for 2 IVs but if >2 IVs there maybe a
more complex pattern of IV correlations
Correlation table:
You can see that mental impairment (DV) is positively correlated with life
events
But negatively correlated with SES
Correlation between life events and SES = positively correlated but very small
Correlation of each X and Y (IV and DV) = both moderate and significant
(0.05)
Thus, the chance of obtaining a Pearson correlation of .372 based on a sample
of 40, when the null hypothesis is true, is 1.8% (.018) ask yourself if it is
sufficiently unlikely?
Can define sufficiently unlikely as whatever percent you want depending on
the %
Rule of thumb if the correlation between two predictors is above +0.7, it is possible
collinearity. If the correlation between two is above 0.8, we have definite collinearity
(we find these stats in Pearson Correlation)
A change in the sample will help change stat to collinearity (e.g. changing
from 40 to 40,000)
But… bivariate correlations may not be sufficient to identify collinearity
One IV may be a non-obvious linear combination of several other IVs
SPSS calculates a thing called tolerance and variance inflation factor; these are
found on the coefficients table under ‘collinearity’ – the statistic shown shows
the degree to which the IVs are correlated with each other (how much variance
they share when predicting mental impairment)
Summary of step 5 (correlation)
Both predictors are statistically significantly correlated with the DV. Life
events is positively correlated with Mental impairment (r = 0.37; p = 0.018)
while SES has a correlation of similar degree, but negative, with Mental
impairment (r = -0.40, p = 0.011). The two predictors are very weakly
correlated (r = 0.12, p = 0.45) and thus give us no reason to expect
collinearity.
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 9 pages and 3 million more documents.

Already have an account? Log in

Document Summary

What we have done so far is simple linear regression, where we fit a straight line to describe the relationship between a predicted variable and an outcome variable. It is possible to have multiple predictor variables (so multiple x"s: y = dependent variable, you will still go through the 3 stages: univariate, bivariate, perform regression. & check assumptions: only difference is we have more than one predictor. The research question is whether more frequent (and severe) life events predicts higher mental impairment. Steps in doing this multiple regression: 1. Recognise problem as a multiple regression: 2. Univariate data description graphical and numerical: 4. Fit full model (if appropriate to 2. : 7. If so, are both predictors required: y = mental impairment, x1 = life events, x2 = ses. The dependent variable, mental impairment, ranged from 17 to 41 (mean 27, sd, 5. 5). Both relationships appear low-moderate strength and only low correlation.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents