Chapter 1 Psychometrics in Neuropsychological Assessment
The Normal Curve
Basis of many commonly used statistical and psychometric models and is the assumed
distribution for many psychological variables.
Definition and Characteristics
Unimodal, symmetrical, asymptotic at the tails.
The ordinate (height of the curve at any point along the x-axis) is the proportion of
persons within the sample who obtained a given score.
Normal curve can also be referred to as a probability distribution.
Relevance for Assessment
As a frequency distribution, the area under any given segment of the normal curve
indicates the frequency of observations or cases within that interval.
• This provides psychologists with an estimate of the normality/abnormality of any
given test score or range of scores
o Normality – score falls in the center of the bell shape, where most of the
scores are located
o Abnormality – score falls at the ends of the bell shape, where there are
Z Scores and Percentiles
Percentile: the percentage of scores that fall at or below a given test score
Converting scores to percentiles – raw scores are ‘standardized’. Usually to Z scores
z = (x – X)/SD
x= measurement value (test score)
X= the mean of the test score distribution
SD= the standard deviation of the test score distribution
Resulting distribution of Z scores has a mean of 0 and a SD of 1.
Interpretation of Percentiles
The relationship between raw or Z scores and percentiles is not linear.
• A constant difference between raw or Z scores will be associated with a variable
difference in percentile scores, as a function of the distance of the two scores
from the mean. • This is due to the fact that there are proportionally more observations (scores)
near the mean than there are farther from the mean
o Otherwise, the distribution would be rectangular or non-normal
Linear Transformations of Z Scores: T Scores and Other Standard Scores
Linear transformation can be used to produce other standardized scores.
T scores, Z scores, standard scores, and percentile equivalents are derived from
samples. They are often treated as population values, any limitations of generalizability
due to reference sample composition or testing circumstances must be taken into
consideration when standardized scores are interpreted.
The Meaning of Standardized Test Scores: Score Interpretation
When comparing scores, it should be done when the distributions for tests that are
being compared are approximately normal in the population. If standardized scores are
to be compared, they should be derived from similar samples or (more ideally) from the
Also when comparing scores, the reliability of the two measures must be considered
and they intercorrelation before determining if a significance exists.
• Relatively large disparities between standard scores may not actually reflect
reliable differences and therefore may not be clinically meaningful.
When test scores are not normally distributed, standardized scores may not accurately
reflect actual population rank.
Comparability across tests does not imply equality in meaning and relative importance
Interpreting Extreme Scores
In clinical practice, one may encounter standard scores that are either extremely low or
high. The meaning/comparability of the scores depends on the characteristics of the
normative sample from which they derive.
Whenever extreme scores are being interpreted, examiners should verify that an
examinee’s score falls within the range of raw scores in the normative sample. • If the normative sample size is substantially smaller than the estimated
prevalence size and the examinee’s score falls outside the sample range, then
considerable caution may be indicated in interpreting the percentile associated
with the standardized score.
When interpreting extreme scores, it depends on the properties of the normal samples
The Normal Curve and Test Construction
A test with a normal distribution in the general population may show extreme skew or
other divergence from normality when administered to a population that differs
considerably from the average individual.
Whether a test produces a normal distribution is also an important aspect of evaluating
tests for bias across different populations.
Depending on the characteristics of the construct being measured and the purpose for
which a test is being designed, a normal distribution of scores may not be obtainable or
desirable. For example:
• The population distribution of the construct being measured may not be normally
• One may want only to identify and/or discriminate between persons at only one
end of a continuum of abilities
o The characteristics of only one side of the sample score distribution are
critical while the characteristics on the other side of the distribution are not
o The measure may even be deliberately designed to have floor or ceiling
It is not unusual for test score distributions to be markedly non-normal, even with large
The degree to which a given distribution approximates the underlying population
distribution increases as the number of observations (N) increases and becomes less
accurate as N decreases.
• Larger sample will produce a more normal distribution only if the underlying
population from which the sample is obtained is normal. o A large N does not correct for non-normality
Small samples may yield non-normal distribution due to random sampling effects, even
though the population from which the sample is drawn has a normal distribution.
Factors that may lead to non-normal test score distributions:
• Existence of discrete subpopulations within the general population with differing
• Ceiling or floor effects
• Treatment effects that change the location of means, medians, and modes, and
affect variability and distribution shape
Skew: formal measure of asymmetry in a frequency distribution that can be calculated
using a specific formula
• Third moment of a distribution (Mean is first movement, variance is the second
Normal distribution is perfectly symmetrical about the mean and has a skew of zero.
Non-normal but symmetric distribution will have a skew value that is near zero
• Negative skew: left tail of the distribution is heavier and often more elongated
than the right tail
• Positive skew: right tail of the distribution is heavier and often more elongated
than the left tail
When distributions are skewed, the mean and median are not identical because the
mean will not be at the midpoint in rank. Z scores will not accurately translate into
sample percentile rank values. This error in mapping increases as skew increases…
Significant skew often indicates the presence of truncated distribution.
• May occur when the range of score is restricted on one side but not the other
Commonly seen with reaction time measures and on error scores.
Def.: May be defined as the presence of truncated tails in the context of limitations in
range of item difficulty.
• High floor: large portion of the examinees obtain raw scores at or near the lowest
possible score. o May indicate that the test lacks a sufficient number and range of easier
• Low ceiling: high number of examinees obtain raw scores at or near the highest
These may significantly limit the usefulness of a measure.
Multimodality and Other Types of Non-Normality
Multimodality: presence of more than one ‘peak’ in a frequency distribution
Uniform or near-uniform distribution: no or minimal peak and relatively equal frequency
These distributions, may cause linearly transformed scores to be totally inaccurate with
respect to actual sample/population percentile rank and should not be interpreted in that
Normalizing Test Scores
When problematic score distributions occur, test developers employ ‘normalizing’
transformations in an attempt to correct departures from normality.
• These do help but they don’t solve everything. They actually cause some more
Test scores should only be normalized when:
1. They come from a large and representative sample
2. Any deviation from normality arises from defects in the test rather than
characteristics of the sample
Test makers should describe in detail the nature of any significant sample non-normality
and the procedures used to correct it for derivation of standardized scores. Reasons for
correction should also be justified.
Norms can fall short in terms of range or cell size. In these cases, data are often
extrapolated or interpolated using the existing score distribution.
• Done using techniques like multiple regression
• This often done with age extrapolations so that they go beyond the actual ages of
the individuals in the sample.
Measurement Precision: Reliability and Standard Error
Psychological tests are not perfect and precise. Test scores are estimates of abilities or
functions, with some degree of measurement error. • Each test differs in the precision of the scores that it produces
A precise test can produce imprecise results if it’s administered:
• In a nonstandard environment
• In a nonoptimal environment
• To an uncooperative examinee
Definition of Reliability
Def.: the consistency of measurement of a given test
• Internal consistency reliability: consistency within a test
• Test-retest reliability: consistency over time
• Alternate form reliability: consistency across alternate forms of the test
• Interrater reliability: consistency across different raters
Reliability indicates the degree to which a test is free from measurement error.
• Error actually consists of the multiple sources of variability that effect test scores
Factors Affecting Reliability