Study Guides (248,299)
Canada (121,468)
York University (10,193)
Sociology (189)
SOCI 4600 (1)
Midterm

Final Exam 3H06 notes.docx

20 Pages
162 Views
Unlock Document

Department
Sociology
Course
SOCI 4600
Professor
Rebecca Jubis
Semester
Summer

Description
Final Exam 3H06 notes: 1. J.Lewis- 1 & 10 Chapter 1:  Why the Social Researcher Uses Statistics   The Nature of Social Research  - Social scientists attempt to explain and predict human behavior  - take „educated guesses‟ about nature of social reality  - examine characteristics of human behavior called variables: characteristics that differ or vary from one individual to another ex. Age, social class, attitude, or one point in time to another ex. Unemployment, crime rate, population  - must also determine the unit of observation, usually collect data on individual persons but sometimes focus their research on aggregates: the way in which measures vary across entire collections of people  researcher might conduct interviews to determine if the elderly are victimized by crime more often than younger respondents (unit of observation = individual respondent)  researcher might study the relationship between the average age of the population and the crime rate in various metropolitan areas (unit of observation = metropolitan areas)  - most useful research methods employed by social researchers for testing their hypotheses are: experiment, survey, content analysis, participant observation and secondary analysis   The Experiment  - distinguished by the degree of control a researcher is able to apply to the research situation  - researchers manipulate one or more of the independent variables to which their subjects are exposed  occurs when experimenter assigns the independent variable to one group of people (experimental group) and withholds it from another group of people (control group)  - have a direct hand in creating the effect that they seek to achieve   The Survey  - retrospective: the effects of independent variables are recorded after they have occurred  seek to reconstruct these influences and consequences by means of verbal reports from their respondents in self-administered questionnaires, face-to-face interviews or phone interviews  - lack tight control, variables are not manipulated and subjects not assigned to a random group therefore much more difficult to establish cause and effect  - can investigate a much larger number of important independent variables in relation to any dependent variable  - results can be generalized to a broader range of people, more representative   Content Analysis  - research method whereby a researcher seeks objectively to describe the content of previously produced messages  - no need directly to observe behavior or to question a sample of respondents, instead they study the content of books, magazines, newspapers, films, radio broadcasts, photos, cartoons, letters etc.   Participant Observation  - researcher participates in the daily life of the people under study, either openly or in the role of researcher or covertly in some disguised role, observing things that happen, listening to what is said and questioning people over some length of time   Secondary Analysis  - researches takes advantage of data sets previously collected or assembled by others called archival data, therefore not the primary analyzer  - comes from government, private agencies, colleges and universities  - relatively quick and easy but exploits data that may have been gathered in a scientifically sophisticated manner  - researcher is limited to what is available and has no say to how variables are defined or measured   The Stages of Social Research  1. Problem to be studied is reduced to a testable hypothesis ex. One parent families generate more delinquencies than two parent families  2. An appropriate set of instruments is develops ex. Questionnaire or an interview schedule  3. The data are collect  4. The data are analyzed for their bearing on the initial hypothesis  raw data is tabulated, calculated, counted, summarized, rearranged, compared or organized  5. Results of the analysis are interpreted and communicated to an audience ex. Through lecture, journal article, press release   Using Series of Numbers to Do Social Research  - when a characteristic is measured, researchers are able to assign it a series of numbers according to a set of rules  - developed measures of a wide range of phenomena  - numbers have 3 important functions depending on the particular level of measurement they employ  1. Classify or categorize at the nominal level of measurement  2. Rank or order at the ordinal level of measurement  3. Assign a score at the interval level of measurement  - The Nominal Level  involves naming or labeling, places cases into categories and counting their frequency of occurrence  ex. To indicated whether each respondent is prejudiced or tolerant toward Latinos: 5 can be regarded as (1) prejudiced, 5 can be regarded as (2) tolerant Attitudes of 10 College Students Towards Latinos: Nominal Data Attitude Toward Latinos Frequency 1 = prejudiced 5 2 = tolerant 5 Total 10  - other examples are sex (male versus female), welfare status (recipient versus non recipient), political party (Republican, Democrat, Libertarian), social character, mode of adaptation and time orientation (present, past, future)  - every case must be placed in one, and only one, category  - categories must be mutually exclusive (non-overlapping)  - categories must be exhaustive (must be a place for every case that arises)  - not graded, ranked or scaled for qualities  - The Ordinal Level  seeks to order cases in terms of degree to which they have any given characteristic  nature of the relationship among ordinal categories depends on that characteristic the researcher seeks to measure  yields information about the ordering of categories but does not indicate magnitude of differences between numbers because the intervals between the points or ranks are not known – impossible to assign scores Attitudes of 5 College Students Towards Latinos: Ordinal Data Student Rank Joyce 1 – most prejudiced Paul 2 – second Cathy 3 – third Mike 4 – fourth Judy 5 – least prejudiced  - The Interval/Ratio Level  indicate the ordering of categories and the exact distance between them  employ constant units of measure ex. Dollars or cents, Fahrenheit or Celsius, yards or feet, minutes or seconds, which yield equal intervals between points on the scale Satisfaction Scores of Four Jobs Job Satisfaction Score Clergy 3.79 Teachers 3.61 Authors 3.61 Psychologists 3.59  makes assumption that our measure of job satisfaction uses a constant unit of measurement (one satisfaction point)  ratio level is the same but in addition presumes the existence of an absolute or true zero (interval may have an artificial zero value or non at all) ex. Age – 0 represents birth or complete absence of age (ratio), Fahrenheit – artificial 0 because 0 degrees does not represent total absence of heat   Different Ways to Measure the Same Variable  - ex. Measuring pain  Nominal – Are you in pain? Yes or No  Ordinal – How bad is the pain? None, Mild, Moderate, Severe  Interval – 0-10 numerical scale, visual analog scale (no pain – worst pain), pain faces scale   Treating Some Ordinal Variables as Interval  - Job satisfaction example  - assume distance between “very dissatisfied” and “a little dissatisfied” is roughly the same as the distance between “a little dissatisfied” and “moderately satisfied” etc.  *if unable to make assumption of equal intervals between the points on the scale then the satisfaction measure should be treated as ordinal  Attitudes of 5 College Students Towards Latinos: Interval Data Student Score* (higher score indicate greater prejudice against Latinos) Joyce 98 Paul 96 Cathy 95 Mike 94 Judy 22   - treating ordinal variables that have nearly evenly spaced values as if they were interval allows researchers to use more powerful statistical procedures   Further Measurement Issues  - discrete data take on only certain specific values ex. Family size, either 1,2,3 or more, represents a discrete interval-level measure  - continuous variables present an infinite range of possible values but manner in which we measure them appear discrete ex. Body weight, can be half numbers 141.12 pounds, some scaled measure to the nearest whole number and some to the nearest half number   The Functions of Statistics  - when researchers quantify their data at different levels of measurement, they are likely to employ statistics as a tool of description or decision making  - Description  describe and summarize mass of data generated from projects  ex. Grades of 80 students can be rearranged in consecutive order and grouped into smaller number of categories (grouped frequency distribution) – presents the grades within broader categories along with the number (frequency (f)) of students whose grades fall into these categories  ex. Grades of 80 students can be rearranged graphically, grades placed along one axis (horizontal base line) and frequencies along other axis (vertical) – easily visualized graphic representation „  ex. Grades of 80 students put into an arithmetic average (mean) to get an overall group tendency or class performance  - Decision Making  make inferences, decisions based on data collected on only a small portion or sample of the larger group we want to study  when a hypothesis is tested on a sample they must decide whether it is accurate to generalize the findings to the entire population that they were drawn from (can result in error) chapter 10:  Controlling For Third Variables   Partial Correlation  - does our interpretation of the relationship between two variables change in any way when looking at the broader context of other related factors?  - to see this most easily focus on scatter plots: visually displays all the information contained in a correlation coefficient – both its direction (by the trend underlying the points) and its strength (by the closeness of the points to a straight line)  ex. a social psychologist sees a correlation between height of an individual and salary – the taller a person is, the higher their salary.  might be misleading, need to look at other factors such as gender – males tend to be taller and tend to be paid more  therefore, gender could explain either all or part of the correlation between height and salary: when we control for gender, the height-salary correlation disappears  - genuine relationship: if one observes an outcome where controlling for a variable does not alter the X-Y relationship, one develops confidence in interpreting the association as causal  - conditional relationship: there is a strong relationship between X and Y for one group but no relationship for the other – if the grouping variable is ignored, the correlation between X and Y is misrepresentative  - spurious relationship: (misleading correlation) within both subgroups X and Y are unrelated - group 1 tends to be higher on both variables so as a result ignoring the subgroup distinction makes it appear as if X and Y are related (height and salary example is a spurious correlation)  - changed relationship: an original positive association between X and Y becomes negative within the two subgroups  - how do you handle an interval level control variable like age for example?  Simple method for adjusting a correlation between two variable for the influence of a third variable when all 3 are interval level  The partial correlation coefficient is the correlation between 2 variables are removing (or partialing out) the common effects of a third variable  Partial correlations can range from -1 to +1 and is interpreted the same way as a simple correlation  The formula for the partial correlation of X and Y controlling for Z is: rXY.Z rXY rXZ YZ 1 – r2XZ 1 – rYZ  The variable before the .(period)(XY) are those being correlated; the variables after the .(period)(Z) is the control variable  - partial correlations can be smaller, equal to or greater than the two-variable simple correlation  - partial correlation coefficient is a very useful statistic for finding spurious relationships   Summary  - in correlation, the social researcher is interested in the degree of association between two variables  - with the aid of the correlation coefficient known as Pearson‟s r, it is possible to obtain a precise measure of both the strength (from 0.0 to 1.0) and direction (positive vs. negative) of a relationship between two variables that have been measured at the interval level  - if a researcher has taken a random sample of scores, they may also compute a t ratio to determine whether the obtained relationship between X and Y exists in the population and is not due to sampling error  - the partial correlation coefficient allows the researcher to control a two-variable relationship for the impact of a third variable   Terms to Remember  - variable  - correlation  strength  direction (positive vs. negative)  curvilinear vs. straight line  - correlation coefficient  - scatter plot  - Pearson‟s correlation coefficient  - partial correlation coefficient 2. Beatrice first 15 pages of chapter 11: Socio 3H06 Chapter 11: Regression Analysis Why study Regression Analysis?  Gives ability to quantify precisely the importance of any proposed factor or variable The Regression Model:  Regression model’s closely allied with correlation-- being interested in the strength of association between 2 variables o Also interested in specifying nature of this relationship o Variable is either independent or dependent (i.e. variable influencing another)  Example: length of sentence (dependent variable) vs. # of prior convictions (independent variable)  Regression analysis equation: Y = a + bX + e o Where:  Y = dependent variable  X = independent variable  a = Y-intercept  Expected level of Y when X = 0 which is base-line amount  b = slope or regression coefficient for X  Represents amount that Y is expected to change (increase or decrease) for each increase of 1 unit in X  e = error term or disturbance term  Represents amount that’s not accounted for by a and bX o Equation used to predict value of the dependent variable on basis of independent variable  Regression and correlation have very similar calculations Requirements for Regression:  Assumptions of regression same as Pearson’s r: 1. Assumed both variables measured at interval level 2. Assumes straight-line relationship. If not, there are various transformations used to make relationship into a straight line. Extreme cases observed in scatter plot should be removed from analysis 3. Sample members to be chosen randomly to use tests of significance 4. To test significance of regression line, assume normality because both variables have large sample Interpreting the Regression Line:  Y-intercept expected/predicted value of Y when X = 0  b increase of decrease in Y expected with each unit increase in X  Regression line not always as direct and meaningful, especially the Y-intercept o Still important but not as substantive as the slop  Regression equation used to project impact of independent variable (X) beyond its range in the sample Prediction Errors:  If correlation is perfect (i.e. r = +1 or -1), all points lie on regression line and all Y values can be predicted perfectly on basis of X  Stronger correlation means stronger fit of points to the line  Difference between points (observed data) and regression line (predicted values) is error or disturbance term (e): e = Y – Ŷ o ^ = predicted  Negative prediction errors occur when data points lie below regression line o Basis of X value, Y value is over prType equation here.edicted  Predictive value of regression line can be assessed by magnitude of these error terms o Large the error, poorer the regression line as prediction device  Negative and positive errors cancel, meaning: o Σe = 0 o This is why we don’t add the error terms to get a measure of predictive ability  To prevent this, square the errors before the sum o Error sum or squares or residual sum of squares, denoted by SS is: error SS = Σe = Σ (Y – Ŷ) 2 error  Total sum of squares sum of squared prediction errors or deviations without using X: SS totalY – Y)  Predictive value of regression equation is in its ability to reduce prediction error is the extent to which SS erroraller than SS total  Regression of sum of squares or explained sum of squares different between 2 is the sum of squares X can explain o Regression sum of squares: SS regS totalSS error o Proportionate reduction in error (PRE) ability of regression line to make predictions, proportion of prediction error can be reduced by knowing independent variable PRE = SS – SStotal error __________________________ SS total Regression and Pearson’s Correlation:  If X and Y are uncorrelated, or if Pearson’s r = 0, SS and SS will be the same total error o X will not help predict Y o Larger value of r, smaller the SS errorlative to SS total o Or, PRE is square of Pearson’s r: r = SS – total error  __________________________  SS total  Where:  r = coefficient of determination, proportion of variance in Y determined by X 2  Range of values for r is from 0-1 o Always positive because negative correlation becomes positive when squared  Coefficient of nondetermination 1 – r , proportion of variance in Y that’s not explained by X is 1 – r : 2 1 – r = SS error ___________ SS total Regression and Analysis of Variance:  Regression can decompose total sum of squares into regression sum of squares (SS reg and error sum of squares (SSerror  Don’t have to calculate predicted value for every respondent o Using coefficients of determination and nondetermination as proportions of explained and unexplained variation o Can decompose total sum of squares (SS total SS Y using this formula: SS reg SS total SS error(1 – r ) SStotal 3. Yemina – chapter 11 other half- Step by step Regression Analysis: Step 1) Calculate the mean of X and the mean of Y Step 2) Calculate SSx, SSy, and SP Step 3) Determine the regression line Step 4) Determine correlation and coefficients of determination and nondetermination Step 5) Calculate SStotal, SSreg, and SSerror Step 6) Calculate regression mean square and error mean square Step 7) Calculate F and compare with the critical value from calculations data Step 8) Construct an analysis of variance summary table  Simple regression: one dependent variable one independent variable  Like Pearson’s correlation, often called simple correlation Multiple (linear) Regression A generalization of simple regression when one uses two or more predictors.  There are obviously multiple relevant factors in analyzing data  Not just one  Even if a large % (eg 72) of the variance is explained  We may want to test with other variables, eg. To test prison sentences lengths with other variables like age of the defendant, or they pled guilty or not  Maybe, in doing so we can for even more of the variance (remaining 28%) that just testing sentence length could not  Lets consider the 2 predictors: prior convictions (X) and age (Z) sentence (Y)  Look at page 393 for table  Look at page 394 for table with means  We cant use simple regression here  This is bc age and priors overlap in their abilities to explain sentence  Older defendants accumulate more priors in lifetime  Priors and age must overlap
More Less

Related notes for SOCI 4600

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit