PSY248 Lecture Notes - Lecture 3: Simple Linear Regression, F-Test, Explained Variation
CORRELATION & REGRESSION
Before statistics;
• Step 1: understand the research question
• Step 2: how are DV and IV measured?
• Step 3: choose method of analysis (correlation/regression)
Conduct correlation/regression analysis;
• Part 1: univariate
• Part 2: bivariate
• Part 3: regression and check assumptions
Step 1 – understand the research question:
• Research – studying carers of people on home hemodialysis
• Research question – how much distress do home hemodialysis caregivers experience,
is this distress related to age of the carer, and can this distress be predicted from
information about the age of the carer?
→ Correlation asks is there a RELATIONSHIP between two variables
→ Regression refers to prediction
Research question in a diagram:
• IV = age of carer
• DV = carer distress
• 1. What is the level of carer distress? Does carer distress vary? (univariate)
o Produce descriptive statistics
• 2. Is carer distress (linearly) related to age of carer? (bivariate)
o Produce correlational statistics
• 3. Does knowledge about age help predict level of carer distress? (regression)
o Produce regression
• In this RQ, we want to know whether a carer’s age is useful in predicting their distress
• This is NON-EXPERIMENTAL research
Step 2: how are the DV/IV measured?
• We look at
o How we intended to measure (questionnaire)
o What we actually ended up with (results – data)
• Decide level of measurement – categorical, ordinal or interval
• Then we need this to decide what statistical analysis to use
• This leads to either correlation/regression = IV/DV continuous/interval/numeric and
normal
• For instance:
o Age: categorical (young, middle, old), distress: categorical (low, medium,
high) = chi-square appropriate
o Age: categorical (young, middle, old), distress: numerical = one-way ANOVA
o Age: numerical, distress: numerical = correlation/regression
• Specific Health Questionnaire; consisted of 15 items
o Study had age categories (one of 9) – but we are going to treat them as
continuous variables (numeric)
Step 3: Choose method of analysis
• DV – numeric
• IV – ordinal
find more resources at oneclass.com
find more resources at oneclass.com
• Use IV as interval, numeric variable then choose correlation and regression
• OR use IV as 9 level categorical variable then choose one-way ANOVA
UNIVARIATE
Tells us whether the DV (in particular) and IV are normally distributed
o We’re more interested in the DV than the IV
• Graphical: histogram
o Stress is measured on a continuous scale so histogram is appropriate
o Central tendency
▪ Typical or average score, centre of distribution, peak in the distribution
– does it exist?
o Variability
▪ Do all the cases tend to score at about the same point or are they widely
scattered – width of distribution
o Skewness
▪ Symmetry vs. lopsidedness of distribution
▪ Positive (right) skew, negative (left) skew, symmetric distributions
have no skew
o Kurtosis
▪ Flatness or peakness of a distribution. Platykurtic (flat), leptokurtic
(very peaked) and mesokurtic (a normal distribution)
o Modal characteristics (modality)
▪ Frequency of peaks as unimodal, bimodal or multimodal
▪ A distribution with no mode is a uniform or rectangular distribution
▪ In general, the presence of more than one frequency peak (mode) in a
distribution means that the data represent several relatively
homogenous subgroups within the larger sample being studied
▪ You want your distribution to be unimodal – indicates homogeneity
USING SPSS
• Use the frequencies command to produce graphical and numeric summaries of all five
possible DVs
• Analyse → descriptive stats → frequencies
o Ask for all the 5 diff histogram characteristics
• Pasted syntax created by filling out point and click dialogue boxes
• Ask questions about all 5 assumptions
• Skewness on SHTOT = divide statistic by its standard error → is the ratio within the
range of +2 and -2, if its WITHIN the range then we are dealing with an unskewed
distribution
• Same rule of thumb occurs for kurtosis then you ask if it is inside the range of +2 and -
2, it is a mesokurtic range, if outside it’s the other two
• Standard deviation tells you about the variability
• We can do the same for age – not as important for the IV; descriptive stats for age as
well
BIVARIATE
• Graphical = scatterplot (describe 7 features)
• Numeric = Pearson correlation (if appropriate)
• We are now moving from univariate stats to bivariate stats
• E.g. is distress related to age?
Use SPSS point and click to produce scatterplot
• Graphs (legacy dialogs) → scatter → simple → define
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Before statistics: step 1: understand the research question, step 2: how are dv and iv measured, step 3: choose method of analysis (correlation/regression) Conduct correlation/regression analysis: part 1: univariate, part 2: bivariate, part 3: regression and check assumptions. Correlation asks is there a relationship between two variables. Iv = age of carer: dv = carer distress, 1. Does carer distress vary? (univariate: produce descriptive statistics, 2. Is carer distress (linearly) related to age of carer? (bivariate: produce correlational statistics, 3. Does knowledge about age help predict level of carer distress? (regression: produce regression. In this rq, we want to know whether a carer"s age is useful in predicting their distress: this is non-experimental research. Step 3: choose method of analysis: dv numeric. Iv ordinal: use iv as interval, numeric variable then choose correlation and regression, or use iv as 9 level categorical variable then choose one-way anova.