# SOC202H1 Chapter Notes -Guesstimate, Systematic Sampling, Rank

22 views2 pages

CH2 – ORGANIZING DATA TO MINIMIZE STATISTICAL ERROR

Accuracy of scientific ideas tested by empirical predictions

Errors – known degrees of imprecision

o Error reduction relies on understanding predictive relationships

among variables

o STATISTICAL ERROR – Know degrees of imprecision in the procedures

used to gather and process information

Controlling Sampling Error

Statistical analysis involves sampling

o Sampling Error – inaccuracy in prediction about a population that

results from the fact that we do not observe every subject in the

population

Observation – a measurement of a single person

Summary Calculation – summing up a group of measurements, based on a

set of observations

o Research interests usually w/ summary statements of the group

Usually small # subjects to draw conclusions on larger pop.

A POPULATION – a large group of people (or objects) of particular interest

that we desire to study and understand

A SAMPLE – a small group of the population; the sample is observed and

measured and then used to draw conclusions on the population

A PARAMETER – a summary calculation of measurements made on all

subjects in a population

o Determine true parameter need survey entire population

A STATISTIC – a summary calculation of measurements made on a sample

to estimate a parameter of the population

o A estimate, tool to draw conclusions about a population in general

Conclusion from sample not absolutely correct, only estimations

Degree of error/confidence in predictions determinable w/ logic

o Statistical Generalization – conclusions about a population made w/

proper statistical procedures

o Statistical Estimate – report of a summary measurement based on:

1. Systematic sampling & precise measurements

2. Reported w/ known degrees of error & confidence

o Guesstimate – a report of a summary measurement based on limited

and usually subjective personal experiences, anecdotal evidence, or

hasty casual observation

Stereotype – a false generalization, guided by feelings

Probability Theory – the analysis and understanding of chance occurrences

o allows compute confidence/accuracy degrees w/ conclusion on pop.

o Allows compute error how often statistic will incorrectly predict

the parameter

Sample Size – the number of cases or observations in a sample

o The number of persons or objects observed

o Large sample smaller sample error, measured in +/- value

REPRESENTATIVE SAMPLE – a sample in which all segments of the

population are included in the sample in their correct proportions in the

population

o More representative smaller sample error

o Non-Representative Sample – some segments of the population are

overrepresented or underrepresented in the sample

o SIMPLE RANDOM SAMPLE – every person (or object) in population has

the same chance of being selected for the sample

Controlling Measurement Error

Observation & measurement key in research

o Measurement Error – inaccuracy in research, which derives from

imprecise measurement instruments, difficulties in the classification of

observations and the need to round numbers.

Operational Definition – set of procedures/operations to measuring a

variable

o Formulation guided by identifying common measurement errors &

doing everything possible to minimize them

Levels of Measurement: Careful selection of Statistical Procedures

Measurement – the assignment of symbols, either names or numbers, to

the differences

o Score – The measurement of a particular sample subject on a single

variable –ex. Subject A’s Age, GPA, Gender

o Unit of Measure – a set interval or distance between quantities

LEVEL OF MEASURE OF A VARIABLE – Identifies the variable’s

measurement properties, which determine the kind of mathematical

operations (addition, etc.) that can be appropriately used with it and the

statistical formulas that can be used w/ it in testing theoretical hypotheses

o NOMINAL VARIABLES – name categories codes merely indicate a

difference in category, class, quality, or kind

No meaningful rank in magnitude, numbers arbitrarily chosen

No sense of degree w/ nominal variables

Dichotomous Variable – variable with only two categories

o ORDINAL VARIABLES – Nominal w/ allowing ranks (high to low)

Can be named categories or numerical scores

Ex. Social class (upper-lower), education level (senior, junior)

Ex. Likert scoring to survey questions

o INTERVAL VARIABLES – Numerical scores w/ defined unit of measure

Allow add, subtract, multiple, divide scores, compute averages

Differences in amount, quantity, degree numerically

Ex. Fahrenheit, interval between degrees same

Vs. Ordinal, has set unit of measure

Subtraction btwn ordinal = difference in rank not distance

btwn scores

o RATIO VARIABLES – Interval w/ true zero point, zero means none

Ex. Weight, height, GPA, distance, population size

Interval may have zero but is arbitrary

Check ratio vs. interval, check if a ratio is meaningful

Ratio – the amount of one observation in relation to another

Ex. Weight, 40g to 20g has 2:1 ratio

Ex. 1st, 2nd & 3rd, 3rd isn’t 3rd times worse than 1st

Dummy Coding – change nominal/ordinal into interval/ratio w/ artificial

numerical scores –ex. Index coding

Unit of measure – for interval/ratio variables only

o Fixes set interval for the numerical values used as scores for an

interval/ratio variable –ex. Kg, cm, mL

o Different from level of measurement

Coding and Counting Observations

Codebook – a concise description of the symbols that signify each score of

each variable

o Use number symbols for categories b/c easier counting & sorting

o Response coding may introduce measurement error, req. precision

All variables coded follow 2 principles:

o Principle of Inclusiveness – for every given variable, there must be a

score/code for every observation made (exhaustive)

Ex. Race: white, black, Asian, other (residual category)

Req. supply codes for missing data (Missing Values)

Disregarded when compute statistic for a variable

o Principle of Exclusiveness – for a given variable, every observation can

be assigned one and only one score

Two scores cannot overlap

Frequency Distributions

Frequency Distribution – a list of all observed scores of a variable and the

frequency for each score (or category)

o Frequency (f) – # of observations or cases for each value

o PROPORTIONAL FREQUENCY DISTRIBUTION – a list of the proportion

of responses for each category or score of a variable

o PERCENTAGE FREQUENCY DISTRIBUTION – a list of the percentage of

responses for each category or score of a variable

Coding and Counting Interval/Ratio Data

Precise Measurement – one in which the degree of measurement error is

sufficiently small for the task at hand, specified by rounding error

o Specification: how much measurement error can be tolerated w/o

encountering practical problems or drawing faulty scientific concl.