false

Study Guides
(248,299)

Canada
(121,468)

York University
(10,193)

Sociology
(189)

SOCI 4600
(1)

Rebecca Jubis
(1)

Midterm

Unlock Document

Sociology

SOCI 4600

Rebecca Jubis

Summer

Description

Final Exam 3H06 notes:
1. J.Lewis- 1 & 10
Chapter 1:
Why the Social Researcher Uses Statistics
The Nature of Social Research
- Social scientists attempt to explain and predict human behavior
- take „educated guesses‟ about nature of social reality
- examine characteristics of human behavior called variables: characteristics that differ or
vary from one individual to another ex. Age, social class, attitude, or one point in time to
another ex. Unemployment, crime rate, population
- must also determine the unit of observation, usually collect data on individual persons but
sometimes focus their research on aggregates: the way in which measures vary across entire
collections of people
researcher might conduct interviews to determine if the elderly are victimized by
crime more often than younger respondents (unit of observation = individual
respondent)
researcher might study the relationship between the average age of the population
and the crime rate in various metropolitan areas (unit of observation = metropolitan
areas)
- most useful research methods employed by social researchers for testing their hypotheses
are: experiment, survey, content analysis, participant observation and secondary analysis
The Experiment
- distinguished by the degree of control a researcher is able to apply to the research situation
- researchers manipulate one or more of the independent variables to which their subjects are
exposed occurs when experimenter assigns the independent variable to one group of
people (experimental group) and withholds it from another group of people (control group)
- have a direct hand in creating the effect that they seek to achieve
The Survey
- retrospective: the effects of independent variables are recorded after they have occurred
seek to reconstruct these influences and consequences by means of verbal reports from their
respondents in self-administered questionnaires, face-to-face interviews or phone interviews
- lack tight control, variables are not manipulated and subjects not assigned to a random group
therefore much more difficult to establish cause and effect
- can investigate a much larger number of important independent variables in relation to any
dependent variable
- results can be generalized to a broader range of people, more representative
Content Analysis
- research method whereby a researcher seeks objectively to describe the content of
previously produced messages
- no need directly to observe behavior or to question a sample of respondents, instead they
study the content of books, magazines, newspapers, films, radio broadcasts, photos, cartoons,
letters etc.
Participant Observation - researcher participates in the daily life of the people under study, either openly or in the role
of researcher or covertly in some disguised role, observing things that happen, listening to
what is said and questioning people over some length of time
Secondary Analysis
- researches takes advantage of data sets previously collected or assembled by others called
archival data, therefore not the primary analyzer
- comes from government, private agencies, colleges and universities
- relatively quick and easy but exploits data that may have been gathered in a scientifically
sophisticated manner
- researcher is limited to what is available and has no say to how variables are defined or
measured
The Stages of Social Research
1. Problem to be studied is reduced to a testable hypothesis ex. One parent families generate
more delinquencies than two parent families
2. An appropriate set of instruments is develops ex. Questionnaire or an interview schedule
3. The data are collect
4. The data are analyzed for their bearing on the initial hypothesis
raw data is tabulated, calculated, counted, summarized, rearranged, compared or
organized
5. Results of the analysis are interpreted and communicated to an audience ex. Through
lecture, journal article, press release
Using Series of Numbers to Do Social Research
- when a characteristic is measured, researchers are able to assign it a series of numbers
according to a set of rules
- developed measures of a wide range of phenomena
- numbers have 3 important functions depending on the particular level of measurement they
employ
1. Classify or categorize at the nominal level of measurement
2. Rank or order at the ordinal level of measurement
3. Assign a score at the interval level of measurement
- The Nominal Level
involves naming or labeling, places cases into categories and counting their
frequency of occurrence
ex. To indicated whether each respondent is prejudiced or tolerant toward Latinos: 5
can be regarded as (1) prejudiced, 5 can be regarded as (2) tolerant
Attitudes of 10 College Students Towards Latinos: Nominal Data
Attitude Toward Latinos Frequency
1 = prejudiced 5
2 = tolerant 5
Total 10
- other examples are sex (male versus female), welfare status (recipient versus non
recipient), political party (Republican, Democrat, Libertarian), social character, mode
of adaptation and time orientation (present, past, future)
- every case must be placed in one, and only one, category
- categories must be mutually exclusive (non-overlapping)
- categories must be exhaustive (must be a place for every case that arises)
- not graded, ranked or scaled for qualities
- The Ordinal Level seeks to order cases in terms of degree to which they have any given characteristic
nature of the relationship among ordinal categories depends on that characteristic the
researcher seeks to measure
yields information about the ordering of categories but does not indicate magnitude
of differences between numbers because the intervals between the points or ranks are
not known – impossible to assign scores
Attitudes of 5 College Students Towards Latinos: Ordinal Data
Student Rank
Joyce 1 – most prejudiced
Paul 2 – second
Cathy 3 – third
Mike 4 – fourth
Judy 5 – least prejudiced
- The Interval/Ratio Level
indicate the ordering of categories and the exact distance between them
employ constant units of measure ex. Dollars or cents, Fahrenheit or Celsius, yards or
feet, minutes or seconds, which yield equal intervals between points on the scale
Satisfaction Scores of Four Jobs
Job Satisfaction Score
Clergy 3.79
Teachers 3.61
Authors 3.61
Psychologists 3.59
makes assumption that our measure of job satisfaction uses a constant unit of
measurement (one satisfaction point)
ratio level is the same but in addition presumes the existence of an absolute or true
zero (interval may have an artificial zero value or non at all) ex. Age – 0 represents
birth or complete absence of age (ratio), Fahrenheit – artificial 0 because 0 degrees
does not represent total absence of heat
Different Ways to Measure the Same Variable
- ex. Measuring pain
Nominal – Are you in pain? Yes or No
Ordinal – How bad is the pain? None, Mild, Moderate, Severe
Interval – 0-10 numerical scale, visual analog scale (no pain – worst pain), pain faces
scale
Treating Some Ordinal Variables as Interval
- Job satisfaction example
- assume distance between “very dissatisfied” and “a little dissatisfied” is roughly the same as
the distance between “a little dissatisfied” and “moderately satisfied” etc.
*if unable to make assumption of equal intervals between the points on the scale then the
satisfaction measure should be treated as ordinal
Attitudes of 5 College Students Towards Latinos: Interval Data
Student Score* (higher score indicate greater prejudice against Latinos)
Joyce 98
Paul 96
Cathy 95
Mike 94
Judy 22
- treating ordinal variables that have nearly evenly spaced values as if they were
interval allows researchers to use more powerful statistical procedures
Further Measurement Issues
- discrete data take on only certain specific values ex. Family size, either 1,2,3 or
more, represents a discrete interval-level measure
- continuous variables present an infinite range of possible values but manner in
which we measure them appear discrete ex. Body weight, can be half numbers
141.12 pounds, some scaled measure to the nearest whole number and some to the
nearest half number
The Functions of Statistics
- when researchers quantify their data at different levels of measurement, they are
likely to employ statistics as a tool of description or decision making
- Description
describe and summarize mass of data generated from projects
ex. Grades of 80 students can be rearranged in consecutive order and
grouped into smaller number of categories (grouped frequency
distribution) – presents the grades within broader categories along with the
number (frequency (f)) of students whose grades fall into these categories
ex. Grades of 80 students can be rearranged graphically, grades placed along
one axis (horizontal base line) and frequencies along other axis (vertical) –
easily visualized graphic representation „
ex. Grades of 80 students put into an arithmetic average (mean) to get an
overall group tendency or class performance
- Decision Making
make inferences, decisions based on data collected on only a small portion
or sample of the larger group we want to study
when a hypothesis is tested on a sample they must decide whether it is
accurate to generalize the findings to the entire population that they were
drawn from (can result in error)
chapter 10:
Controlling For Third Variables
Partial Correlation
- does our interpretation of the relationship between two variables change in any way when
looking at the broader context of other related factors?
- to see this most easily focus on scatter plots: visually displays all the information contained
in a correlation coefficient – both its direction (by the trend underlying the points) and its
strength (by the closeness of the points to a straight line)
ex. a social psychologist sees a correlation between height of an individual and salary
– the taller a person is, the higher their salary. might be misleading, need to look
at other factors such as gender – males tend to be taller and tend to be paid more
therefore, gender could explain either all or part of the correlation between height and
salary: when we control for gender, the height-salary correlation disappears
- genuine relationship: if one observes an outcome where controlling for a variable does not
alter the X-Y relationship, one develops confidence in interpreting the association as causal - conditional relationship: there is a strong relationship between X and Y for one group but
no relationship for the other – if the grouping variable is ignored, the correlation between X
and Y is misrepresentative
- spurious relationship: (misleading correlation) within both subgroups X and Y are
unrelated - group 1 tends to be higher on both variables so as a result ignoring the subgroup
distinction makes it appear as if X and Y are related (height and salary example is a spurious
correlation)
- changed relationship: an original positive association between X and Y becomes negative
within the two subgroups
- how do you handle an interval level control variable like age for example?
Simple method for adjusting a correlation between two variable for the influence of a
third variable when all 3 are interval level
The partial correlation coefficient is the correlation between 2 variables are
removing (or partialing out) the common effects of a third variable
Partial correlations can range from -1 to +1 and is interpreted the same way as a
simple correlation
The formula for the partial correlation of X and Y controlling for Z is:
rXY.Z rXY rXZ YZ
1 – r2XZ 1 – rYZ
The variable before the .(period)(XY) are those being correlated; the variables after
the .(period)(Z) is the control variable
- partial correlations can be smaller, equal to or greater than the two-variable simple
correlation
- partial correlation coefficient is a very useful statistic for finding spurious relationships
Summary
- in correlation, the social researcher is interested in the degree of association between two
variables
- with the aid of the correlation coefficient known as Pearson‟s r, it is possible to obtain a
precise measure of both the strength (from 0.0 to 1.0) and direction (positive vs. negative) of
a relationship between two variables that have been measured at the interval level
- if a researcher has taken a random sample of scores, they may also compute a t ratio to
determine whether the obtained relationship between X and Y exists in the population and is
not due to sampling error
- the partial correlation coefficient allows the researcher to control a two-variable relationship
for the impact of a third variable
Terms to Remember
- variable
- correlation
strength
direction (positive vs. negative)
curvilinear vs. straight line
- correlation coefficient
- scatter plot
- Pearson‟s correlation coefficient
- partial correlation coefficient
2. Beatrice first 15 pages of chapter 11:
Socio 3H06 Chapter 11: Regression Analysis Why study Regression Analysis?
Gives ability to quantify precisely the importance of any proposed factor or variable
The Regression Model:
Regression model’s closely allied with correlation-- being interested in the strength of
association between 2 variables
o Also interested in specifying nature of this relationship
o Variable is either independent or dependent (i.e. variable influencing another)
Example: length of sentence (dependent variable) vs. # of prior
convictions (independent variable)
Regression analysis equation:
Y = a + bX + e
o Where:
Y = dependent variable
X = independent variable
a = Y-intercept
Expected level of Y when X = 0 which is base-line amount
b = slope or regression coefficient for X
Represents amount that Y is expected to change (increase or
decrease) for each increase of 1 unit in X
e = error term or disturbance term
Represents amount that’s not accounted for by a and bX
o Equation used to predict value of the dependent variable on basis of
independent variable
Regression and correlation have very similar calculations
Requirements for Regression:
Assumptions of regression same as Pearson’s r: 1. Assumed both variables measured at interval level
2. Assumes straight-line relationship. If not, there are various transformations
used to make relationship into a straight line. Extreme cases observed in scatter
plot should be removed from analysis
3. Sample members to be chosen randomly to use tests of significance
4. To test significance of regression line, assume normality because both variables
have large sample
Interpreting the Regression Line:
Y-intercept expected/predicted value of Y when X = 0
b increase of decrease in Y expected with each unit increase in X
Regression line not always as direct and meaningful, especially the Y-intercept
o Still important but not as substantive as the slop
Regression equation used to project impact of independent variable (X) beyond its
range in the sample
Prediction Errors:
If correlation is perfect (i.e. r = +1 or -1), all points lie on regression line and all Y values
can be predicted perfectly on basis of X
Stronger correlation means stronger fit of points to the line
Difference between points (observed data) and regression line (predicted values) is
error or disturbance term (e):
e = Y – Ŷ
o ^ = predicted
Negative prediction errors occur when data points lie below regression line
o Basis of X value, Y value is over prType equation here.edicted
Predictive value of regression line can be assessed by magnitude of these error terms
o Large the error, poorer the regression line as prediction device
Negative and positive errors cancel, meaning:
o Σe = 0
o This is why we don’t add the error terms to get a measure of predictive ability
To prevent this, square the errors before the sum
o Error sum or squares or residual sum of squares, denoted by SS is: error
SS = Σe = Σ (Y – Ŷ) 2
error
Total sum of squares sum of squared prediction errors or deviations without using
X:
SS totalY – Y)
Predictive value of regression equation is in its ability to reduce prediction error is the
extent to which SS erroraller than SS total
Regression of sum of squares or explained sum of squares different between 2 is
the sum of squares X can explain
o Regression sum of squares: SS regS totalSS error
o Proportionate reduction in error (PRE) ability of regression line to make
predictions, proportion of prediction error can be reduced by knowing independent
variable
PRE = SS – SStotal error
__________________________
SS total
Regression and Pearson’s Correlation:
If X and Y are uncorrelated, or if Pearson’s r = 0, SS and SS will be the same
total error
o X will not help predict Y
o Larger value of r, smaller the SS errorlative to SS total
o Or, PRE is square of Pearson’s r:
r = SS – total error
__________________________
SS total
Where:
r = coefficient of determination, proportion of variance in Y determined by X
2
Range of values for r is from 0-1
o Always positive because negative correlation becomes positive when squared
Coefficient of nondetermination 1 – r , proportion of variance in Y that’s not
explained by X is 1 – r : 2
1 – r = SS error
___________
SS total
Regression and Analysis of Variance: Regression can decompose total sum of squares into regression sum of squares (SS reg
and error sum of squares (SSerror
Don’t have to calculate predicted value for every respondent
o Using coefficients of determination and nondetermination as proportions of
explained and unexplained variation
o Can decompose total sum of squares (SS total SS Y using this formula:
SS reg SS total
SS error(1 – r ) SStotal
3. Yemina – chapter 11 other half-
Step by step Regression Analysis:
Step 1) Calculate the mean of X and the mean of Y
Step 2) Calculate SSx, SSy, and SP
Step 3) Determine the regression line
Step 4) Determine correlation and coefficients of determination and
nondetermination
Step 5) Calculate SStotal, SSreg, and SSerror
Step 6) Calculate regression mean square and error mean square
Step 7) Calculate F and compare with the critical value from calculations data
Step 8) Construct an analysis of variance summary table
Simple regression: one dependent variable one independent variable
Like Pearson’s correlation, often called simple correlation
Multiple (linear) Regression
A generalization of simple regression when one uses two or more predictors.
There are obviously multiple relevant factors in analyzing data
Not just one
Even if a large % (eg 72) of the variance is explained
We may want to test with other variables, eg. To test prison sentences
lengths with other variables like age of the defendant, or they pled
guilty or not
Maybe, in doing so we can for even more of the variance (remaining
28%) that just testing sentence length could not
Lets consider the 2 predictors: prior convictions (X) and age (Z)
sentence (Y)
Look at page 393 for table
Look at page 394 for table with means
We cant use simple regression here
This is bc age and priors overlap in their abilities to explain sentence
Older defendants accumulate more priors in lifetime
Priors and age must overlap

More
Less
Related notes for SOCI 4600

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.