Chapter 4: Personality Assessment
Personality assessment is the measurement of the individual characteristics
of a person. Though there are many types of methods that researchers use to
study personality, the most commonly used are personality tests.
What Makes a Good Personality Test?
According to standards up by professional organizations in education and
psychology, including the American Psychological Association, the
developers of a personality test must demonstrate that the test is valid and
reliable, and specify the conditions, populations, and cultures the test applies
The biggest difference between a personality test you might find on the
Internet and what you find in journals or purchase from recognized
publisher is that legitimate personality tests have reliability, validity, and
generalizability, backed by research evidence that is available for public
Test Reliability: Generalizability Across Time, Items, and Raters
Reliability is a prerequisite for validity. We cannot know the correct time
with an unreliable watch. A measure must first be consistent in order to be a
valid representation of an underlying theoretical construct. Reliability is an
estimate of how consistent a test is: A good test gives consistent results over
time, items, or raters.
Reliability describes the extent to which test scores are consistent and
reproducible with repeated measurements.
Temporal consistency-reliability: When an assessment gives consistent
results across time, often demonstrated by test-retest reliability.
Test-retest reliability: A measure of temporal consistency; when a test gives
a consistent result from one point in time to a later point in time.
Internal consistency reliability: When an assessment gives consistent results
across items, demonstrated parallel forms reliability, split-half reliability, or
Cronbach’s alpha reliability.
Parallel-forms reliability: A measure of internal consistency reliability; when
two or more versions of a test give consistent results.
Split-half reliability: A measure of internal consistency reliability; when each
half of a test gives consistent results.
Cronbach’s alpha (α): A measure of internal consistency reliability; the
average correlation among all possible combinations of test items taking
them half at a time.
Interrater reliability: A measure of rater consistency; when there is
agreement among raters. Test Validity:
Validity is the extent to which a test measures what it is suppose to measure.
Every test aims to measure an underlying concept called a construct, which
derives from a theory. Therefore, ultimately, every test must have construct
validity and successfully measure the theoretical concept it was designed to
Construct validity: when an assessment successfully measures the theoretical
concept it was designed to measure.
A test has face validity when it appears to measure the construct of interest.
For example, you might reasonably figure out that a test that asks about
suicide ideation, mood, feelings of sadness, and changes in appetite is
measuring feelings of depression. This is an example of a test with high face
validity. However, with neuropsychological tests or tests asking about how
one interacts with other people it would be harder to see exactly what
concept the test is measuring. These are examples of tests with low face
Face validity is not the most convincing type of validity. However, it is useful
under two conditions. First, face validity is important for personnel testing,
or other situations where the cooperation and motivation of the test-taker
can affect the results of the test.
A second useful condition for face validity is when researchers are
developing a new measure of a concept. Often, they will think of items that
appear to measure what they want the test to measure, then they will
administer their test to respondents and see which items are actually related
o the trait or concept the researcher wants to measure.
Criterion validity determines how good a test is, by comparing the results of
the test to an external standard like another personality test or some
Criterion validity: Establishes how good an assessment is by comparing the
results to an external standard such as another personality test or some
In addition to criterion validity, we might check to see if out test is similar to
other tests of the same construct or to tests of related constructs. This
establishes convergent validity. At the same time, we want to be sure that our
test is different from tests of constructs that we theorize to be unrelated to
the one we are interested in. We might look for discriminant validity to be
sure that out test taps a different concept entirely. To establish construct
validity we must demonstrate both what a test measures and what it doesn’t
Convergent validity: Establishes how good an assessment is by comparing
the results to other tests of the same construct or to tests to related
constructs in order to establish what the test measures. Discriminant validity: Establishes how good an assessment is by comparing
the results to tests of theo