Sept 24 Psychological assessment PSCY37
Reliability and Validity:
Reliability: Examines which sources are responsible for the error to occur.
Refers to the degree to which the test scores differ.
Classical Test Theory: The observed score is assumed to be equal to some true score+ Error.
Reliability: some tests get closer to measuring true scores than others; the more reliable they
are the closer we are towards the true score. (minimizing the error)
Assumption: errors of measurement are random.
Opposed to the systematic (everybody gets affected to the noisy envi.)
Rubber yardstick analogy: It would stretch and expand randomly. If it was 2 inches all the time it
would be systematic. Since the errors are randomly they would occur frequently and therefore it
would be a normal curve. Hence, random measurement error will always follow a normal curve/
Sampling theory suggests that the distribution of random errors is bell- shaped. The idea is that
if you took many measurement since there would lots of instances and end up looking like
Degree of spread- reflects the amount of sampling error
IMP The standard deviation of errors is
when a curve is much skinny than other suggests that the amount of error is smaller vs. a fatter
Standard Dev (how much a score varies from the mean) Std Error (error in measurement - mot
The Central Q: What prop of the variation in observed test scores can be attributed to true score
variation VS. error variation?
Pie chart 1- 10% is accounted by the error variation (Reliable)
2- 35% by error variation
True Var- differences amongst individual scores
Error Var- variation due to other errors (envi)
The Domain Sampling Model-
Neuroticism- a personality trait characterized by -ve states, anxiety, instability, etc
Problems that the test creators face:
BFI captures by 8 items
On BFI 10 captured 2 items
How reliably can these tests capture such complex traits?
Models of Reliability:
3 main approaches:
Internal consistency (most popular but has its limitations)
Test- Retest Administer the same test to the same group at 2 diff times.
Examines the temporal stability- expecting that the scores are stable with the retest.
Not the right approach to take if you expect the fluctuation or change
Basic Calculation: determine a coefficient for each score
What are some factors to consider?
In a setting where people might leave frequently (if someone leaves a job)
Time Interval between tests (still reliable after a month)
Carryover effects- to what extend does doing the test the second time would affect the scores.
Charecteristics of the participants themselves
Alternate/ parallel forms- the idea that you have 2 equivalent version of the test. You have
conducted a research prior that allows you to create 2 diff versions of the same thing.
Time is no longer is the source of variabiulity since they will be conducted simultaneously
However, Item sampling diff may occur as a source of error- diff in the items in each version (the
way their worded may affect how hard they are)
Split- Half Reliability
One version of the test that is finished in 1 sitting- the same test is split in one version
Even items to one score and the odds
The risk of taking first and second half- if the second half is harder than the first the scores
would differ and lower the reliability.
The risk of Examining the cor