CHAPTER FOUR – Personality Assessment
Personality Assessment: measurement of the individual characteristics of a person
What Makes a Good Personality Test?
The developers of a personality test must demonstrate that the test is valid and reliable,
and specify the conditions, populations, and cultures the test applies to. They must also
provide theoretical background and evidence confirming (or disconfirming) that the test
is related to certain outcomes. When possible, should make sure that the results are
meaningful and not just due to biased responding on the part of test-takers.
Reliability: an estimate of how consistent a test is, describing the extent to which test
scores are consistent and reproducible with repeated measurements. Consistent results
should be across time, items, and raters.
Temporal consistency reliability: when an assessment gives consistent results across
time, often tested by test-retest reliability
Test-retest reliability: when a test gives a consistent result from one point in time
to a later point in time.
o Need to be careful that the participants aren’t just remembering previous
test so have to make sure that it is separated enough by time so that
memorization cannot occur but traits will not change.
o This is only really applicable to tests where the results should be
consistent – this would most likely work on something like IQ but not on
day-to-day emotional levels.
Internal consistency reliability: when an assessment gives consistent results across
items, demonstrated by:
Parallel-forms reliability: two versions of the test that are comparable and can be
checked to see if the scores on both versions are similar
Split-half reliability: splitting a test in half and seeing if test-takers’ scores on one
half correlated with the scores on the other half.
Cronbach’s alpha ( ): Taking the correlation between the scores of two halves of
a test then calculating the average correlation of all possible halves of the test.
This estimates the generalizability of the score from one set of items to another.
o Researchers try to make sure that their measures have an alpha of 0.70 to
0.80. o This should be even higher when designing tests to compare or judge
individuals (IQ tests). At least 0.90 or ideally 0.95
Interrater reliability: two separate judges rate the personality or behaviour of the person.
They should come to the same conclusion.
Validity: the extent to which a test measures what it is supposed to measure.
Construct validity: every test aims to measure an underlying concept called a construct,
derived from theory. This means that every test needs construct validity, successfully
measuring the theoretical concept that it was supposed to measure.
Face validity: test appears to measure the construct of interest. (A test high in face
validity that measured depression would ask questions about sadness, depressive
episodes etc.) This really isn’t the best kind of validity. It is useful under two kinds of
Important for personnel testing or other situations where the cooperation and
motivation of the test-taker can affect the results of the test. Participants then
view the content of a test as fair and relevant.
When researchers are developing a new measure of a concept.
Criterion validity: determines how good a test is, by comparing the results of the test to
an external standard like another personality test or some behavioural outcome.
Example: a test of introversion vs. extroversion should be able to distinguish
between the two
Convergent validity: if the test is similar to other tests of the same construct or to tests of
Discriminant validity: make sure that the test tests a different aspect of personality,
when compared to other tests.
With convergent and discriminant, what you are doing is making sure that the
test converges with similar constructs and discriminates between dissimilar
A test that gives back only general and superficial results that are ambiguous enough to
apply to anyone lacks predictive validity.
Barnum effect: people believe that a test is accurate because the test really does
have a little bit of everything for everyone. Test Generalizability
Generalizability: establishes the boundaries or limitations of a test.
Cannot use a test for a use other than what it was intended, nor administer it to a
group of people it was not validated on.
Is the NEO-PI-R a Good Test?
Cronbach’s alphas ranged from 0.56 to 0.81, which is adequate for a test with
only eight items
Test-retest reliability correlations were quite high
Each facet tested with the appropriate factor
Correlated with the Eysenck personality test
Can be used in many populations. Can be used in clinical settings, drug rehab
programs, and individual psychotherapy.
Has been translated into other languages and appears to be valid in other
Types and Formats of Personality Tests
Two kinds of tests:
Self-report (objective): respondents answer questions about themselves
Performance-based (projective): use an unstructured format in which participants
must respond to a stimulus in as much detail as they would like.