Class Notes (1,100,000)
CA (650,000)
Brock U (10,000)
PSYC (800)
Lecture 2

PSYC 2P25 Lecture Notes - Lecture 2: Inter-Rater Reliability, Tiger Woods, Construct Validity

Department
Psychology
Course Code
PSYC 2P25
Professor
Michael Ashton
Lecture
2

This preview shows half of the first page. to view the full 3 pages of the document.
Lecture 2 January 10, 2016
Assessing Quality of Measurement
Reliability
o A measurement is reliable if it agrees with other measurements of the same variable.
Three kinds of reliability:
Internal- consistency reliability
When scores on a measurement are calculated as a sum ( or mean) of various parts
(items)
Scores should depend strongly on the common element of the items
Indicates the extent to which scores represent the common element of the items
How to make measurements have higher internal-consistency reliability:
Include lots of items: Adding many items together gives better measurement of their
common characteristic (Tiger woods would rather have an average of more holes of
golf in order to be more accurate)
o Any single item has its own specific element, but when we combine items,
these specific parts get cancelled out
Include ‘items’ that are correlated with each other: items that correlate strongly with
each other are measuring a common characteristic
If items are uncorrelated with each other, they don’t have a common characteristic:
might be measuring several different characteristics instead
Interrater Reliability
When a characteristic is measured by obtaining ratings made by several positions
Scores on the total ( or average) rating should depend strongly on the raters common
judgement
Indicates the extent to which overall scores represent the common element of the
scores given by the various raters
How to get high interrater reliability:
o Have many raters (so that anyone raters idiosyncrasies get cancelled out)
o Have only raters whose ratings tend to agree (so that there is a strong
common element to their rating)
o (note similarities with internal-consistency reliability; “raters” are like
likelihood”)
o Average of the ratings not the item
Stability (test-retest Reliability)
o If variable is supposed to be lasting characteristic (not a temporary state)
o Measurements taken on two occasions (e.g. a few weeks apart) should be highly
correlated
o Test-retest reliability is usually calculated simply as the correlation between scores on
two occasions
o Should not be years apart due to developmental changes