For unlimited access to Class Notes, a Class+ subscription is required.
Chapter 5: Measurement Concepts
Reliability of Measures
•Reliability refers to the consistency or stability of a measure of behavior. A reliable
measure of a psychological variable such as intelligence will yield the same result
each time you administer the intelligence test to the same person. The test would be
unreliable if it measured the same person as average one week, low the next and
bright the next. Put simply, a reliable measure does not fluctuate from one reading
to the next.
•A more formal way of understanding reliability is to use the concepts of true score
and measurement error. Any measure that you make can be thought of as
compromising two components: 1) a true score, which is the real score on the
variable and 2) measurement error. An unreliable measure of intelligence
contains considerable measurement error and so does not provide an accurate
indication of an individual’s true intelligence.
•In contrast, a reliable measure of intelligence – one that contains little measurement
error – will yield an identical (or nearly identical) intelligence score each time the
same individual is measured.
•To illustrate the concept of reliability further, imagine that you know someone
whose “true” intelligence score is 100. Now suppose that you administer an
unreliable intelligence test to this person each week for a year. Now suppose that
you test another friend who also has a true intelligence score of 100; however, this
time you administer a highly reliable test. What might your data look like? In each
case, the average score is 100. However scores on the unreliable test range from 85
to 115, whereas scores on the reliable test range from 97 to 103. The measurement
error in the unreliable test is revealed in the greater variability shown by the person
who took the unreliable test.
•Researchers cannot use unreliable measures to systematically study variables or the
relationships among variables. Trying to study behavior using unreliable measures
is a waste of time because the results will be unstable and unable to be replicated.
•Reliability is most likely to be achieved when researchers use careful measurement
procedures. It might mean paying close attention to the way questions are phrased
or the way recording electrodes are placed on the body to measure physiological
•We can assess the stability of measures using correlation coefficients. There are
several ways of calculating correlation coefficients; the most common correlation
coefficient when discussing reliability is the Pearson product-moment
correlation coefficient. The Pearson correlation coefficient (symbolized as r) can
range form 0.00 to +1.00 and 0.00 to -1.00.
•A correlation of 0.00 tells us that the two variables are not related at all. The closer
a correlation is to 1.00, either +1.00 or -1.00, the stronger the relationship. The
positive and negative signs provide info about the direction of the relationship.
When the correlation coefficient is positive, there is a positive linear relationship. A
negative linear relationship is indicated by a minus sign.
•To assess the reliability of a measure, we will need to obtain at least two scores on
the measure from many individuals. If the measure is reliable, the two scores should
be very similar; a Pearson correlation coefficient that relates the two scores should
be a high positive correlation. When you read about reliability, the correlation will
usually be called a reliability coefficient. Let’s examine specific methods of assessing
•Test-retest reliability is assessed by measuring the same individuals at two points
in time. For example, the reliability of a test of intelligence could be assessed by
giving the measure to a group of people on one day and again a week later. We
would then have two scores for each person, and a correlation coefficient could be
calculated to determine the relationship between the first test score and the retest
•It is difficult to say how high the correlation should be before we accept the measure
as reliable, but for most measures the reliability coefficient should probably be at
•Given that test-retest reliability involves administering the same test twice, the
correlation might be artificially high because the individuals remember how they
responded the first time. Alternate forms reliability is sometimes used to avoid this
problem. Alternate forms reliability involves administering two different forms of
the same test to the same individuals at two points in time.
•Intelligence is a variable that can be expected to stay relatively constant over time;
thus, we expect the test-retest reliability for intelligence to be very high. However,
some variables may be expected to change from one test period to the next. For
example, a mood scale designed to measure a person’s current mood state is a