Chapter 4 Identifying Reliable and Valid Predictors of Performance
Look at the website
Canada HRC – anything that the federal government controls
On the exam – Ontario Human Rights Code – provincial jurisdiction
talks about discrimination in employment***, in services, and in accommodation
After studying this chapter, you sho uld be able to:
• Define reliability and validity in the context of selecting human resources
• Discuss various approaches to establishing the reliability of a measure
• Discuss the steps in validating a selection tool
• Decide the number of predictors to be used in making staffing decisions
• Discuss the importance of validating a test for different employee groups
1. Reliability and Validity
• Reliability refers to the consistency or stability of a measure.
• Validity requires that the predictor scores measure what they are supposed to
measure and are significantly related to a relevant criterion.
– can you use that score to predict some sort of future conduct
– Cannot have validity without reliability plus more
• Reliability is reduced when measurement error is high.
• SCORE OBTAINED = TRUE SCORE + ERROR
• Error can be positive or negative.
– It is never 100% accurate
– True score is what you would get if all the conditions were perfect, but usually
that’s not the case Error Scores
Fewest errors either positive or negative = most reliable test
• Measurement error can be systematic or random.
• Systematic errors occur in a predictable, consistent fashion – room is too cold,
you can say scores may be lower for more people, ranking is still going to be the
same but not all as best as they could be
• Random errors are unpredictable can be totally random – quirky knowledge etc
• Systematic errors can emerge from three important sources:
– Measuring instrument: e.g., a poor test that uses culturally loaded
words/not worded as good as it could have been
– Measuring situation: e.g., noise outside the testing room that distracts all
test takers/lost all power half way through the exam, scores were not great,
threw people off
– Individual factors: e.g., a person’s test anxiety that negatively affects his
or her performance in all tests
• Reliability coefficient (coefficient of determination) refers to the squared
correlation between true scores and observed scores. A coefficient of 0.80 means
that only 20% of the variance is attributable to error.
• A reliability coefficient is the estimated proportion of total variance due to
systematic sources of variance.
– The closer the number is to 1 – the better the reliability is
Validity • Does a test measure what it purports to measure? (e.g., Does a test on staffing
measure the students’ true knowledge of relevant staffing concepts?)
• The fact that a test is reliable is no guarantee that it is valid; however, an
unreliable test cannot be valid.
– Twopart test*
– Does it measure what it is suppose to measure
– Does the test that you have designed that it accurately shoes that they can lift
– If you can show that they can. Do they need to be able to lift that is that highly
tied to how you will perform during the job
2. Establishing Reliability
• Testretest method involves the administration of the same test at two different
times and correlating the two resulting set of scores.
• Appropriate when effect of memory on retest is not very significant
• Measures the stability component (coefficient of stability)
• Equivalent form approach uses two equivalent but different versions of a test to
• Are the parallel forms truly equivalent?
• Measures the equivalence component (coefficient of equivalence)
• Internal consistency measure computes the extent to which all parts of a
measure (or all items or questions in a test) assess similar qualities.
• Is the test or measure onedimensional?
• Measures the internal consistency component (or equivalence of two subsets of
items or questions)
– The higher the number the better it is going to show it is a good predictor
because they got consistent results • Interrater reliability coefficient is useful to assess the level of agreement
among a set of judges who assess subjective constructs.
– Example: Which job applicant performed best during the job interview?
• Percentage of rater agreement, Cohen’s Kappa, and Kendall’s coefficient of
concordance are three popular indices of rater agreement.
3. Establishing Validity
1. Empirical (criterionrelated) validation approaches attempt to relate test scores
with a jobrelated criterion, usually performance.
2. Rational approaches focus on the content and design of the test and ask whether
the test actually measures what it purports to measure