Reliability

Discrepancies between the true ability and measurement of ability form errors of measurement

•Error means that there will always be some inaccuracy in our measurements

•Tests that are relatively free of measurement error are said to be reliable

History and Theory of Reliability

Spearman’s Early Studies

Charles Spearman is responsible for the advanced development of reliability assessment

•Moivre introduced the basis notion of sampling error

•Pearson development the product moment correlation

•Spearman worked out most of the basics of contemporary reliability theory

•Spearman’s article attracted Thorndike

•Item response theory (IRT): uses computer technology to advance psychological measurement

significantly

•IRT is built on many of the ideas that Spearman introduced

Basics of Test Score Theory

The observed score for each person always differs from the person’s true ability

•The difference is the measurement error

•A major assumption of the classical test theory is that errors of measurement are random

Basic sampling theory: tells us that the distribution of random errors is bell-shaped

•The center of the distribution represents the true show

www.notesolution.com

•The dispersion about the mean of the distributions display the distribution of sampling errors

•The true score for an individual will not change with repeated applications of the same test

•Standard deviation = standard error of measurement

•The standard error of measurement tell us, on average, how much a score varies from the true

score

•The standard deviation of the observed scored and the reliability of the test are used to estimate

the standard error of measurement

The Domain Sampling Model

Domain sampling model: considers the problem created by using a limited number of items to represent a

larger and more complicated constract

•The error is due to the sample of items

•As the sample gets larger, it represents the domain more and more accurately

•The greater the number of items, the higher the reliability

•Each item on the test should represent the studied ability

•Reliability can be estimated from the correlation of the observed test score with the true score, but

true scores are not available

•Our only alternative is to estimate that the true score is

•Different random samples of items might give different estimates of the true score due to

sampling error

•To estimate reliability, we create many randomly parallel tests by drawing repeated random

samples of items from the same domain

•If we create many tests from sampling, we should get a normal distribution of unbiased estimates

of the true score

•Then we would find the correlation between each tests and each of other tests, and the

correlations then would be averaged

Item Response Theory

www.notesolution.com

A growing movement is turning away from classical test theory because:

•Classical test theory requires the exactly same items to be administered to each person

•For a trait, such as intelligence, a small number of items concentrate on an individual’s level of

ability

IRT: uses the computer to focus on the range of item difficulty that helps assess an individual’s ability

level

•If the person gets a few easy items correct, the computer might move to the more difficult items

•Then, this level of ability is intensely sampled

•A more reliable of estimate of ability is obtained using a shortest test with fewer items

•The method requires a bank of items that have been evaluated for level of difficulty

•Complex computer software is required

Models of Reliability

Reliability coefficient: is the ratio of the variance of the true score to the variance of the observed scores

•Describes theoretical values in a population rather than those obtained from a sample

•% of the observed variance that is attributable to variation in the true score

•If we subtract this ration from 1.0, we have the % of variation attributable to random error

Sources of Error

An observed score differs from a true for reasons such as:

•Situational factors such as loud noises

•The room may be too hot or too cold

•The items on test might not be representative of the domain

Suppose you could spell 96% of the words in the English language correctly, but the 20-item

spelling test you took included 5 items (20%) that you could not spell

www.notesolution.com

## Document Summary

Discrepancies between the true ability and measurement of ability form errors of measurement: error means that there will always be some inaccuracy in our measurements, tests that are relatively free of measurement error are said to be reliable. Charles spearman is responsible for the advanced development of reliability assessment: moivre introduced the basis notion of sampling error. Spearman worked out most of the basics of contemporary reliability theory. Item response theory (irt): uses computer technology to advance psychological measurement significantly. Irt is built on many of the ideas that spearman introduced. The observed score for each person always differs from the person"s true ability: the difference is the measurement error, a major assumption of the classical test theory is that errors of measurement are random. A growing movement is turning away from classical test theory because: classical test theory requires the exactly same items to be administered to each person.