For unlimited access to Textbook Notes, a Class+ subscription is required.
Chapter 3 – Norms and Reliability
- This chapter concerns two basic concepts:
- Scores on psychological tests are interpreted by reference to norms that are based on the
distribution of scores, obtained by a representative sample of examinees.
- First off, the initial outcome or score is useless by itself, and in order for a test to be meaningful
examiners must be able to convert the initial score to some form of a derived score based on
comparison to a standardization or norm group.
o The vast majority of tests are interpreted by comparing individual results to a norm
group performance; however criterion referenced tests are an exception discussed
- A norm group consists of a sample of examinees who are representative of the population for
whom the test is intended.
- When a nationwide sampling is collected for standardization, whereas the essential objective of
test standardization is to determine the distribution of raw scores in the norm group so that the
test developer can publish derived scores known as norms.
o Norms come in many varieties, for example, percentile ranks, age equivalents, grade
equivalents, or standard scores
o Norms indicate and examinee’s standing on the test relative to the performance of
other persons of the sae age, grade, sex, and so on.
o Norms may become outmoded in a few year, so periodic renorming of tests should be
the rule than an exception
- The most basic level of information provided by a psychological test is the raw score.
o In personality testing, the raw score is often the number of questions answered in the
keyed direction for a specific scale.
- Raw score in isolation is absolutely meaningless
- A raw score becomes meaningful mainly in relation to norms, an independently established
frame of reference derived from a standardization sample
- Norms are empirically established by administering a test to a large and representative sample
- The vast majority of psychological tests are interpreting by consulting norms; these instruments
are called – norm referenced tests
- Criterion referenced tests help determine whether a person can accomplish an objectively
defined criterion such as adding pairs of two digit numbers with 97 percent accuracy; norms are
- There are different kinds of norms but they share one characteristic:
o Each incorporates a statistical summary of a large body of scores
Essential Statistical Concepts
- When confronted with a collection of quantitative data, the natural human tendency is to
summarize, condense, and organize it into meaningful patterns.
- Frequency distribution - a simple and useful way of summarizing data is to tabulate a frequency
o This is prepared by specifying a small number of usually equal – sized class interval and
then tallying how many scores fall within each interval
The sum for all interval will equal N, the total number of scores in the sample
- A histogram provides graphic representation of the same information contained in the
frequency distribution - the height indicates the number of scores occurring within the interval.
o Horizontal axis - portrays the scores grouped into class intervals
o Vertical axis – depicts the number of scores falling within each class interval
- Frequency polygon is similar to a histogram, except that the frequency of the class intervals is
represented by single points rather than columns ; then joined by straight line
Measures of Central Tendency
- The mean (M), or arithmetic average is one such measure of central tendency
o We compute the mean by adding all the scores up and dividing by N, the number of
o Another useful index of central tendency is the median, the middlemost score when all
the scores have been ranked
o The mode is simply the most frequently occurring score.
If two scores tie for highest frequency of occurrence the distribution is said to
- If a distribution of scores is skewed (that is asymmetrical), the median is a better index of central
tendency than the mean.
Measures of Variability
- Two or more distributions of test scores may have the same mean, yet differ greatly in the
extent of dispersion of the scores about the mean
- The most commonly used statistical index of variability in a group of scores is the standard
- The standard deviation reflects the degree of dispersion in a group of scores
o If the scores are tightly packed around a central value, the standard deviation is small
o The extreme case in which all scores are identical, the standard deviation is exactly zero
o The scores when it is more spread out, the standard deviation becomes larger
- The standard deviation, or s, is simply the square root of the variance is designated as s2 . The
formula of variance:
o S2 = sum of (X – Mean of X)2 / (N-1)
- The distribution of scores would more and more closely resemble a symmetrical,
mathematically defined, bell-shaped curve called the normal distribution
- One reason that psychologists prefer normal distributions is that the normal curve has useful
mathematical features that form the basis for several kinds of statistical investigation
- An inferential statistic such as the t test for a difference between means would be appropriate
o Inferential statistics are based on the assumption that the underlying population of
scores is normally distributed, or nearly so
- Another basis for preferring the normal distribution is its mathematical precision. Since the
normal distribution is precisely defined in mathematical terms, it is possible to compute the are
underneath the regions of the curve with great accuracy
- A third basis for preferring a normal distribution of test scores is that the normal curve often
arises spontaneously in nature
- There is no law of nature regarding the form that frequency distributions must take
- An approximately normal distribution is also found with numerous mental tests, even for tests
constructed entirely without reference to the normal curve
- Skewness refers to the symmetry or asymmetry of a frequency distribution
- If test scores are piled up at the low end of the scale, the distribution is said to be positive
- If the test scores are piled up at the high end of the scale, it is negatively skewed
- In psychological testing, skewed distributions usually signify that the test developer has included
too few easy items or too few hard items.
- If scores are massed at the high end (negative skew), the test probably contains too few hard
items to make effective discriminations at this end of the scale
- The most straightforward solution is to add items or modify existing so that the test has more
easy items (to reduce positive skew) or more hard items (to reduce negative skew)
- If it is too late to revise the instrument, the test developer can use a statistical transformation to
help produce a more normal distribution of scores
o The preferred strategy is to revise the test so that skewness is minimal or nonexistent
Percentiles and Percentile Ranks
- A percentile expresses the percentage of person in the standardization sample who scored
below a specific raw score
- Higher percentiles indicate higher scores.