SOC222H5
Weiguo Zhang
Chapter 2

Sociology

SOC222H5

Weiguo Zhang

Chapter 2: Basic Descriptive Statistics
2.1 Percentages and Proportions
Percentages and proportions supply frame of reference for reporting research results in the
sense that they standardize the raw data: percentages to the base 100 and proportions’ to the
base 1.00
Percentages and proportions are easier to read and comprehend than frequencies
o Particularly obvious when attempting to compare groups of different sizes
Computing percentages eliminates the difference in size of the two groups by standardizing
both distributions to the base of 100
Guidelines on use of percentages and proportions:
1. When working with a small number of cases (i.e. less than 20), it is preferable to report the
actual frequencies rather than percentages or proportions
Percentages can change drastically with relatively minor changes in data
2. Always report number of observations along with proportions and percentages
Permits reader to judge adequacy of the sample size
3. Percentages and proportions can be calculated for variables at ordinal and nominal levels of
measurement, even though they require division
Percentages and proportions don’t require division of scores of variable (as would be
the case in computing average score on a test for example) but rather the number of
cases in particular category (f) of the variable by the total number of cases in the sample
(n)
2.2 Ratios and Rates
Provide some additional ways of summarizing results simply and clearly
Ratios specially useful for comparing categories of a variable in terms of relative frequency
o Determine ratios by dividing frequency of one category by frequency in another
o Express relative size of categories: they tell us exactly how much one category
outnumbers the other
o Often multiplied by some power of 10 to eliminate decimal points
Rates are another way of summarizing distribution of a single variable
o Defined as number of actual occurrences of some phenomenon divided by number of
possible occurrences per some unit of time
o Usually multiplied by some power of 10 to eliminate decimal points
o Often multiplied by 100,00 when number of actual occurrences of some phenomenon is
extremely small relative to size of population (i.e. homicides in Canada)
2.3 Frequency Distributions Introduction
Frequency distribution is a table that summarizes distribution of a variable by reporting number
of cases contained in each category of the variable
very helpful and commonly used way of organizing and working with data
construction of a frequency table is almost always the first step in any statistical analysis
one general rule that applies to all frequency distributions is that the categories of frequency
distribution must be exhaustive and mutually exclusive
o categories must be stated in a way that principle applies to construction of frequency
distributions for variables measured at all three levels of measurement 2.4 Frequency Distributions for variables measured at nominal and ordinal levels
Nominal Level Variables- for each category of the variable being displayed, the occurrences are
counted and the subtotals, along with total number of cases (n)are reported
o table has descriptive title, clearly labeled categories and a report of total number of
cases at bottom of frequency column
must be included in all tables regardless of variable or level of measurement
o when categories are collapsed (i.e. non-medical doctors could include counselor or a
psychologist), information and detail will be lost
Ordinal-level Variables- frequency distributions constructed following same routines used for
nominal-level variables
o Column of percentages by category has been added to table (increase clarify of table
and are common adjuncts to basic frequency distribution for variables measured at all
levels)
2.5 Frequency distributions for variables measured at the internal-ratio level
In general, construction of frequency distributions for variables measured at interval-ration level
is more complex than for nominal and ordinal variables
Interval-ratio variables usually have a large number of possible scores
o Large number of scores requires some collapsing or grouping of categories to produce
reasonably compact frequency distributions
o Must decide how many categories to use and how wide these categories should be
Always involve a trade-off between more detail (greater number of narrow categories) or more
compactness (small number f wide categories)
Constructing the Frequency Distribution
Categories often called intervals when working with interval-ratio data
Frequency distribution constructed by listing categories in order (i.e. smallest to largest,
youngest to oldest), counting the number of times each score occurs and then totaling the
number of scores for each category
Midpoints- need to be used when constructing or interpreting certain graphs like the frequency
polygon
o Midpoints defined as the points exactly halfway between upper and lower limits and
can be found for any interval by dividing the sum of the upper and lower limits by two
Real Limits- for certain purposes, you must eliminate the gap between intervals and treat a
distribution as a continuous series of categories that border each other
o Necessary in constructing some graphs such as the histogram
Stated Limits (intervals of a frequency distribution when stated as discrete categories)- organize
scores of the variable into a series of discrete, non-overlapping intervals
To treat the variables as continuous, we must use real limits
o To find the real limit of any interval, divide the distance between the stated limits (the
gap) in half and add the result to all upper stated limits and subtract it from all lower

