PAGE 1

SOC202

PROF SCHIEMAN

MAR 20 2012

TG. SP.

Today’s topic: Regression Analysis

•Imagine you collected a great data set

•You have all these data, and large sample

•Last classes have been ‘which test do you use?’ eg. Why would you use a

t-test

•Its all about patterns

•Formulas are all different but we are at the end all trying to say something

about social patterns

Regression serves a variety of purposes

•Predicting future values of a variable, based on known values

•Describe the patterns in complicated data in a simple way

•Evaluating and refining theories about how changes in one variable cause

change in another

oDoes it really matter about one increase in education? – average

income increases by 10,000. If you knew that, you would know

education pays off.

Regression: The Basics (#1)

Q: “As the level of X increases, _____________?

A: “As the level of X increases, what happens to levels of Y?”

•How much better does regression line fit the data? Eg. How well did you

do in the research paper? Prof summarized everyone into one mean

(average).

Regression: The Basics (#2)

•People get more income because of their education, and people go back

to school again for more income. So income is the one affecting it first

•Regressing dependent variable on the independent variable <- keeping

this in mind will orient you to how you look at the table

Regression: The Basics

•Regression takes ANOVA and expands it out (eg. More education = more

income)

•How far is that line from the grand mean? Imagine a scatter where height

had nothing to do with weight. How would you draw a best fitting straight

line? It will just be the mean.

PAGE 2

Assuming linearity, the grgression line goes through mean y at every value of x.

•Each one has a mean, each one has a spread.

What is null-hypothesis? Formal way is to say that regression coefficient (slope)

equals 0 in the population

Informal: there is no relationship between X and Y in the population

Example:

If you get this example, you can plug this into anything

Just imagine that you are interested in research question, does eating a peep,

how does it affect aggression?

•Let’s say you observed kids eating peeps and then observing them at

recess.

•Aggression is the dependent variable. We are regressing aggression on

peeps.

•Null hypothesis: Formally the slope is 0 in the population

o265 (number of obs) is imagine if you randomly selected them from

a large pool of kids. 265 carries a burden. They represent all the

kids that you are interested in

oIf you scooped a sample and by mistake, picked up rough kids who

are misrepresented, that 265 doesn’t represent the population.

Then everything else is crap (garbage In, garbage out – it doesn’t

mean anything)

oIf measure of peeps or measure of aggression is bad, everything

falls apart also

•What is Regression Equation?

oPredicted aggression = 33.17 + 3.08 (peeps)

Other variables can be added, such as number of hours of

sleep the kid got last night, age, gender, etc (becomes

multivariable)

•Interpret the slope

oThe slope is 3.08

oThe way to interpret is, with each one unit increase in peeps, score

on the aggression index increase by 3.08

oSlope: You got a kid, one of the kids in front of you. He ate 4 peeps.

He slapped 6 times. Multiplying those together, and doing that for

all the kids, and averaging it out over peeps. What’s the overall

spread in x-variable.

•What would be the score on the aggression index when peeps = 0? 1? 5?

10?

oAggression = 33.17 + (3.08)0 = 33.17

oAggression = 33.17 + (3.08)1 = 36.25 (As you move up one peep,

PAGE 3

as you move up in x, nothing is happening in y. there can be a

negative association as well.

oAggression is through the roof

•What is t-statistic?

oT-statistic is slope divided by standard error

oStandard error is very straightforward. Sample to sample to sample

variability. Not exciting, theres no other way to say it. If you know

that standard error is large, you can scoop another 265 kids, and

find that slope is zero. This basically tells you how much error/how

accurate is in the estimate

•How do you obtain it?

•Slope divided by standard error: 3.08/.52 = 5.92

o5.92 is very unique.

oEg. Gwen Stefani is so different. 5.92 is like Gwen Stefani

Reflect on how things wouldl look like in terms of coefficient

and standard error

1.5 would not allow you to reject null hypothesis.

As kids moved up in levels of peep consumption, not much

happened in terms of aggression.

Much worse, if there is a lot of residual, and have kids all

over the place,

•Standard error of a slope

oKeep in mind that t-statistic is slope divided by standard error, then

you need to get large t-statistic to reject the null hypothesis

oThe whole idea is deviation from some kind of value

oThe denominator is like averaging out how residuals look overall

•What is the p-value?

oComputer will give you the precise value

op < 0.001

•t-statistic is given a 5.92 and p-value associated is 0.001. Reject null.

o1.6 value is the size of t-statistic you need to obtain to reject null

hypothesis at 0.05 level.

oThe best way to think through is to come up with a brief essay, to

pick a few t-statistics and look at sample to sample variation

oIf t-statistic was 1.97, we could reject the null hypothesis, but we

will be unsure, when we choose another whole 265, we could

possibly fail to reject the null hypothesis. Then we’ll have to say

‘association isn’t strong enough’

•Confidence interval

oLCL = 2.06, UCL = 4.10

oIf confidence interval doesn’t contain 0, at least 95% confident that

true population slope falls somewhere between 2.06 and 4.10

**Unlock Document**

###### Document Summary

Imagine you collected a great data set: you have all these data, and large sample, last classes have been which test do you use?" eg. why would you use a t-test. Its all about patterns: formulas are all different but we are at the end all trying to say something about social patterns. If you knew that, you would know education pays off. Regression: the basics (#2: people get more income because of their education, and people go back to school again for more income. So income is the one affecting it first: regressing dependent variable on the independent variable <- keeping this in mind will orient you to how you look at the table. Imagine a scatter where height had nothing to do with weight. Assuming linearity, the grgression line goes through mean y at every value of x: each one has a mean, each one has a spread.

## More from OC5431

###### SOC202H1 Lecture Notes - Chi-Squared Distribution, Null Hypothesis, Dependent And Independent Variables

Lecture Note

###### SOC202H1 Lecture Notes - Null Hypothesis, Analysis Of Variance, Social Stratification

Lecture Note

###### SOC202H1 Lecture Notes - Sampling Distribution, Standard Deviation, Standard Error

Lecture Note

## Classmates also unlocked

###### SOC202H1 Lecture Notes - Critical Thinking, Null Hypothesis, Thought Experiment

Lecture Note

###### ANOVA

Lecture Note

###### SOC202H1 Lecture Notes - Research Question, Dependent And Independent Variables, University Of New Hampshire

Lecture Note

###### Hypothesis testing

Exam Note

###### SOC202H1 Chapter Notes -Norm (Social), Scientific Method, Descriptive Statistics

Textbook Note

###### SOC202H1 Lecture Notes - Dependent And Independent Variables, Scientific Calculator

Lecture Note

###### Measures of central tendency

Lecture Note

###### SOC202H1 Chapter Notes -Normal Distribution, Standard Deviation, Human Behavior

Textbook Note

###### Statistical analysis

Lecture Note

###### SOC202H1 Lecture Notes - Confidence Interval, Analysis Of Variance, Chi-Squared Distribution

Lecture Note

###### SOC202H1 Lecture Notes - Linear Regression, Null Hypothesis, Statistic

Lecture Note

###### SOC202H1 Chapter Notes -Guesstimate, Systematic Sampling, Rank

Textbook Note

###### SOC202H1 Lecture Notes - Sample Size Determination, Statistical Parameter, Sampling Error

Lecture Note

###### SOC202H1 Lecture Notes - Marginal Distribution, Null Hypothesis, Contingency Table

Lecture Note

###### SOC202H1 Lecture Notes - Null Hypothesis, Analysis Of Variance, Social Stratification

Lecture Note