Reliability and Validity
The Problem of Random and Systematic Error
- Measuring your height: 165 cm (accurate?). Maybe:
- There could be a random error in the measure (misread the number? You
- Systematic error (or bias) (was everyone wearing shoes when measured?)
Observed Score = True Score +/- Random Error + OR - Systematic Error (or bias)
- We must minimize random error and systematic error (so observed score =
true score). But how?
- By maximizing the reliability and validity of measures
Part 1: Reliability (Minimizing Random Error)
A. The “more is better” rule (random error will cancel out over repeated
Example 1: Beating Vince Carter in basketball. Playing 1 on 1 - we start with the ball.
You can choose to either play whoever gets to 20 baskets first, or whoever gets the
first basket wins. We can take advantage of random error to our advantage to win
the basketball match in this situation. However, if we were to choose playing 20
baskets, it would be very unlikely to get ‘lucky’ getting 20 baskets in.
Example 2: Grandfathers who “cant believe” that he has 1 grandchild who happens
to be a boy. Another grandfather says, “can’t believe” that he has 10 grandchildren
who are all boys. 10 male grandchildren in a row is a very unusual outcome just
through the randomness of outcomes.
B. We can decrease random error by increasing the reliability of our measures.
A measure is reliable if it measures things consistently
C. Types of Reliability:
1. Internal reliability (internal consistency) - Relevant when measure consists
of multiple items (e.g., exam)
- Is there consistency between the items?
- Inconsistency can be a sign of random error
- For example, if a test asks if one is friendly, outgoing, talkative, and
gregarious, in order for internal consistency to be accurate, you should rate
approximately the same level on all measures bc/ they are all correlated. If
you rate high on friendliness and outgoingness, but low on gregariousness,
this is probably a reliability error for those who do not know the definition of
the word ‘gregarious’ Assessing internal reliability:
- Item-total correlations (if random error is low, responses t any single item
should be positively correlated with the total score)
Eliminate items with low item-total correlations (and/or add more items)
- Split-half reliability (e.g., odd-even correlation)
High positive correlation = low random error
- Best to use average of all split-halves
- E.g., the KR-20 (p. 133) for m