Lecture 4 05/07/2013
Topic 6: Statistics describing variables.
Descriptive versus Inferential Statistics
Used to describe characteristics of a population or a sample.
Used to generalize from a sample to the population from which the sample was drawn. They involve
using a sample to make inferences about the population.
Used to describe (descriptive) or make inferences (inferential) about the values of a single variable
The distribution à how many cases take each value?
The central tendency à which is the most typical value?
The dispersion à how much do the values vary? The more the data is dispersed the less the
measure of CT will be.
Used to describe (descriptive) or make inferences (inferential) about the relationship between the
values of two variables Multivariate:
Used to describe (descriptive) or make inferences (inferential) about the relationship among the values
of 3+ variables
Describing a distribution:
A list of the number of observations in each category of the variable. It displays the frequency with
which each possible value occurs.
Called absolute frequencies or raw frequencies.
Relative frequencies is just another way of saying percentages.
Not desirable to use when you have a small number of cases.
Custom in social research is to round percentages as giving specific values gives a false sense of
accuracy that people may apply to the whole population ▯ this is wrong.
For both cases make sure you always include total number of frequencies as this is your ‘N’
Central Tendency vs. Dispersion
A measure of central tendency indicates the most typical value, the one value that best
represents the entire distribution. What is the centre of the data?
Measure of Dispersion: A measure of dispersion tells us how typical that value is by indicating the extent to
which observations are concentrated in a few categories of the variable or spread
out among all categories . How average is the average value?
How good of a job does the CT do? It’s a check on the CT
It also looks at covariation. How much of the data varies?
Measuring Central Tendency (Nominal Data):
The mode is the most frequently occurring value—the category of the variable that contains
the greatest number of cases. The only operation required is counting.
Problem: It disregards the other values. It is a misrepresentation of the entire data set.
Measuring Dispersion (Nominal Data):
The proportion of cases that do not fall in the modal category tells us just how typical the modal value
is. This is what the textbook calls the variation ratio. How many cases do not fall into the
modal category. So below, 60% of the observation do not fall into the modal
Measuring CT (Ordinal Data):
The median is the value taken by the middle case in a distribution. It has the same number of
cases above and below