University of Toronto St. George
Sociology
SOC202H1
Scott Schieman
CH2 – ORGANIZING DATA TO MINIMIZE STATISTICAL ERROR
Accuracy of scientific ideas tested by empirical predictions o Unit of Measure – a set interval or distance between quantities
Errors – known degrees of imprecision LEVEL OF MEASURE OF A VARIABLE – Identifies the variable’s
o Error reduction relies on understanding predictive relationships measurement properties, which determine the kind of mathematical
among variables operations (addition, etc.) that can be appropriately used with it and the
o STATISTICAL ERROR – Know degrees of imprecision in the procedures statistical formulas that can be used w/ it in testing theoretical hypotheses
used to gather and process information o NOMINAL VARIABLES – name categories codes merely indicate a
difference in category, class, quality, or kind
Controlling Sampling Error
Statistical analysis involves sampling No meaningful rank in magnitude, numbers arbitrarily chosen
No sense of degree w/ nominal variables
o Sampling Error – inaccuracy in prediction about a population that Dichotomous Variable – variable with only two categories
results from the fact that we do not observe every subject in the o ORDINAL VARIABLES – Nominal w/ allowing ranks (high to low)
population
Observation – a measurement of a single person Can be named categories or numerical scores
Ex. Social class (upper-lower), education level (senior, junior)
Summary Calculation – summing up a group of measurements, based on a Ex. Likert scoring to survey questions
set of observations o INTERVAL VARIABLES – Numerical scores w/ defined unit of measure
o Research interests usually w/ summary statements of the group Allow add, subtract, multiple, divide scores, compute averages
Usually small # subjects to draw conclusions on larger pop.
Differences in amount, quantity, degree numerically
A POPULATION – a large group of people (or objects) of particular interest Ex. Fahrenheit, interval between degrees same
that we desire to study and understand Vs. Ordinal, has set unit of measure
A SAMPLE – a small group of the population; the sample is observed and
measured and then used to draw conclusions on the population Subtraction btwn ordinal = difference in rank not distance
btwn scores
A PARAMETER – a summary calculation of measurements made on all o RATIO VARIABLES – Interval w/ true zero point, zero means none
subjects in a population Ex. Weight, height, GPA, distance, population size
o Determine true parameter need survey entire population Interval may have zero but is arbitrary
A STATISTIC – a summary calculation of measurements made on a sample
Check ratio vs. interval, check if a ratio is meaningful
to estimate a parameter of the population Ratio – the amount of one observation in relation to another
o A estimate, tool to draw conclusions about a population in general Ex. Weight, 40g to 20g has 2:1 ratio
Conclusion from sample not absolutely correct, only estimations st nd rd rd rd st
Degree of error/confidence in predictions determinable w/ logic Ex. 1 , 2 & 3 , 3 isn’t 3 times worse than 1
Dummy Coding – change nominal/ordinal into interval/ratio w/ artificial
o Statistical Generalization – conclusions about a population made w/ numerical scores –ex. Index coding
proper statistical procedures Unit of measure – for interval/ratio variables only
o Statistical Estimate – report of a summary measurement based on:
1. Systematic sampling & precise measurements o Fixes set interval for the numerical values used as scores for an
2. Reported w/ known degrees of error & confidence interval/ratio variable –ex. Kg, cm, mL
o Different from level of measurement
o Guesstimate – a report of a summary measurement based on limited
and usually subjective personal experiences, anecdotal evidence, or Coding and Counting Observations
hasty casual observation Codebook – a concise description of the symbols that signify each score of
Stereotype – a false generalization, guided by feelings
each variable
Probability Theory – the analysis and understanding of chance occurrences o Use number symbols for categories b/c easier counting & sorting
o allows compute confidence/accuracy degrees w/ conclusion on pop. o Response coding may introduce measurement error, req. precision
o Allows comput

