For unlimited access to Class Notes, a Class+ subscription is required.

Chapter 5: Measurement Concepts

Reliability of Measures

•Reliability refers to the consistency or stability of a measure of behavior. A reliable

measure of a psychological variable such as intelligence will yield the same result

each time you administer the intelligence test to the same person. The test would be

unreliable if it measured the same person as average one week, low the next and

bright the next. Put simply, a reliable measure does not fluctuate from one reading

to the next.

•A more formal way of understanding reliability is to use the concepts of true score

and measurement error. Any measure that you make can be thought of as

compromising two components: 1) a true score, which is the real score on the

variable and 2) measurement error. An unreliable measure of intelligence

contains considerable measurement error and so does not provide an accurate

indication of an individual’s true intelligence.

•In contrast, a reliable measure of intelligence – one that contains little measurement

error – will yield an identical (or nearly identical) intelligence score each time the

same individual is measured.

•To illustrate the concept of reliability further, imagine that you know someone

whose “true” intelligence score is 100. Now suppose that you administer an

unreliable intelligence test to this person each week for a year. Now suppose that

you test another friend who also has a true intelligence score of 100; however, this

time you administer a highly reliable test. What might your data look like? In each

case, the average score is 100. However scores on the unreliable test range from 85

to 115, whereas scores on the reliable test range from 97 to 103. The measurement

error in the unreliable test is revealed in the greater variability shown by the person

who took the unreliable test.

•Researchers cannot use unreliable measures to systematically study variables or the

relationships among variables. Trying to study behavior using unreliable measures

is a waste of time because the results will be unstable and unable to be replicated.

•Reliability is most likely to be achieved when researchers use careful measurement

procedures. It might mean paying close attention to the way questions are phrased

or the way recording electrodes are placed on the body to measure physiological

reactions.

www.notesolution.com

•We can assess the stability of measures using correlation coefficients. There are

several ways of calculating correlation coefficients; the most common correlation

coefficient when discussing reliability is the Pearson product-moment

correlation coefficient. The Pearson correlation coefficient (symbolized as r) can

range form 0.00 to +1.00 and 0.00 to -1.00.

•A correlation of 0.00 tells us that the two variables are not related at all. The closer

a correlation is to 1.00, either +1.00 or -1.00, the stronger the relationship. The

positive and negative signs provide info about the direction of the relationship.

When the correlation coefficient is positive, there is a positive linear relationship. A

negative linear relationship is indicated by a minus sign.

•To assess the reliability of a measure, we will need to obtain at least two scores on

the measure from many individuals. If the measure is reliable, the two scores should

be very similar; a Pearson correlation coefficient that relates the two scores should

be a high positive correlation. When you read about reliability, the correlation will

usually be called a reliability coefficient. Let’s examine specific methods of assessing

reliability.

Test-Retest Reliability

•Test-retest reliability is assessed by measuring the same individuals at two points

in time. For example, the reliability of a test of intelligence could be assessed by

giving the measure to a group of people on one day and again a week later. We

would then have two scores for each person, and a correlation coefficient could be

calculated to determine the relationship between the first test score and the retest

score.

•It is difficult to say how high the correlation should be before we accept the measure

as reliable, but for most measures the reliability coefficient should probably be at

least .80.

•Given that test-retest reliability involves administering the same test twice, the

correlation might be artificially high because the individuals remember how they

responded the first time. Alternate forms reliability is sometimes used to avoid this

problem. Alternate forms reliability involves administering two different forms of

the same test to the same individuals at two points in time.

•Intelligence is a variable that can be expected to stay relatively constant over time;

thus, we expect the test-retest reliability for intelligence to be very high. However,

some variables may be expected to change from one test period to the next. For

example, a mood scale designed to measure a person’s current mood state is a

www.notesolution.com