For unlimited access to Textbook Notes, a Class+ subscription is required.

Chapter 3 – Norms and Reliability

- This chapter concerns two basic concepts:

o Norms

o Reliability

- Scores on psychological tests are interpreted by reference to norms that are based on the

distribution of scores, obtained by a representative sample of examinees.

- First off, the initial outcome or score is useless by itself, and in order for a test to be meaningful

examiners must be able to convert the initial score to some form of a derived score based on

comparison to a standardization or norm group.

o The vast majority of tests are interpreted by comparing individual results to a norm

group performance; however criterion referenced tests are an exception discussed

subsequently.

- A norm group consists of a sample of examinees who are representative of the population for

whom the test is intended.

- When a nationwide sampling is collected for standardization, whereas the essential objective of

test standardization is to determine the distribution of raw scores in the norm group so that the

test developer can publish derived scores known as norms.

o Norms come in many varieties, for example, percentile ranks, age equivalents, grade

equivalents, or standard scores

o Norms indicate and examinee’s standing on the test relative to the performance of

other persons of the sae age, grade, sex, and so on.

o Norms may become outmoded in a few year, so periodic renorming of tests should be

the rule than an exception

Raw Scores

- The most basic level of information provided by a psychological test is the raw score.

o In personality testing, the raw score is often the number of questions answered in the

keyed direction for a specific scale.

- Raw score in isolation is absolutely meaningless

- A raw score becomes meaningful mainly in relation to norms, an independently established

frame of reference derived from a standardization sample

- Norms are empirically established by administering a test to a large and representative sample

of persons.

- The vast majority of psychological tests are interpreting by consulting norms; these instruments

are called – norm referenced tests

- Criterion referenced tests help determine whether a person can accomplish an objectively

defined criterion such as adding pairs of two digit numbers with 97 percent accuracy; norms are

not essential

- There are different kinds of norms but they share one characteristic:

o Each incorporates a statistical summary of a large body of scores

Essential Statistical Concepts

- When confronted with a collection of quantitative data, the natural human tendency is to

summarize, condense, and organize it into meaningful patterns.

Frequency Distributions

- Frequency distribution - a simple and useful way of summarizing data is to tabulate a frequency

distribution.

o This is prepared by specifying a small number of usually equal – sized class interval and

then tallying how many scores fall within each interval

The sum for all interval will equal N, the total number of scores in the sample

- A histogram provides graphic representation of the same information contained in the

frequency distribution - the height indicates the number of scores occurring within the interval.

o Horizontal axis - portrays the scores grouped into class intervals

o Vertical axis – depicts the number of scores falling within each class interval

- Frequency polygon is similar to a histogram, except that the frequency of the class intervals is

represented by single points rather than columns ; then joined by straight line

Measures of Central Tendency

- The mean (M), or arithmetic average is one such measure of central tendency

o We compute the mean by adding all the scores up and dividing by N, the number of

scores

o Another useful index of central tendency is the median, the middlemost score when all

the scores have been ranked

o The mode is simply the most frequently occurring score.

If two scores tie for highest frequency of occurrence the distribution is said to

be bimodal

- If a distribution of scores is skewed (that is asymmetrical), the median is a better index of central

tendency than the mean.

Measures of Variability

- Two or more distributions of test scores may have the same mean, yet differ greatly in the

extent of dispersion of the scores about the mean

- The most commonly used statistical index of variability in a group of scores is the standard

deviation

- The standard deviation reflects the degree of dispersion in a group of scores

o If the scores are tightly packed around a central value, the standard deviation is small

o The extreme case in which all scores are identical, the standard deviation is exactly zero

o The scores when it is more spread out, the standard deviation becomes larger

- The standard deviation, or s, is simply the square root of the variance is designated as s2 . The

formula of variance:

o S2 = sum of (X – Mean of X)2 / (N-1)

Normal Distribution

- The distribution of scores would more and more closely resemble a symmetrical,

mathematically defined, bell-shaped curve called the normal distribution

- One reason that psychologists prefer normal distributions is that the normal curve has useful

mathematical features that form the basis for several kinds of statistical investigation

- An inferential statistic such as the t test for a difference between means would be appropriate

o Inferential statistics are based on the assumption that the underlying population of

scores is normally distributed, or nearly so

- Another basis for preferring the normal distribution is its mathematical precision. Since the

normal distribution is precisely defined in mathematical terms, it is possible to compute the are

underneath the regions of the curve with great accuracy

- A third basis for preferring a normal distribution of test scores is that the normal curve often

arises spontaneously in nature

- There is no law of nature regarding the form that frequency distributions must take

- An approximately normal distribution is also found with numerous mental tests, even for tests

constructed entirely without reference to the normal curve

Skewness

- Skewness refers to the symmetry or asymmetry of a frequency distribution

- If test scores are piled up at the low end of the scale, the distribution is said to be positive

skewed

- If the test scores are piled up at the high end of the scale, it is negatively skewed

- In psychological testing, skewed distributions usually signify that the test developer has included

too few easy items or too few hard items.

- If scores are massed at the high end (negative skew), the test probably contains too few hard

items to make effective discriminations at this end of the scale

- The most straightforward solution is to add items or modify existing so that the test has more

easy items (to reduce positive skew) or more hard items (to reduce negative skew)

- If it is too late to revise the instrument, the test developer can use a statistical transformation to

help produce a more normal distribution of scores

o The preferred strategy is to revise the test so that skewness is minimal or nonexistent

Percentiles and Percentile Ranks

- A percentile expresses the percentage of person in the standardization sample who scored

below a specific raw score

- Higher percentiles indicate higher scores.