Chapter 9 general factor (g) and a large number of specific factors.
Problem of Defining Intelligence This was based on the phenomenon of positive
Alfred Binet is the original author of the intelligence manifold (all tests are influenced by g, so all diverse test
tests. He defined intelligence as the tendency to take administered to unbiased population samples will be
and maintain a definite direction; capacity to make highly correlated). To support his notion, he developed
adaptations.” Spearman defined intelligence as the factor analysis (used to determine how much variance a
ability to deduce either relation or correlates. Others set of scores has in common. The common variance is
have given different definitions. Recent views depict the g. This g is the most established and ubiquitous
intelligence as a blend of abilities including personality predictor of occupational and educational performance.
and various aspects of memory. To have a general g means that if there are any
There are 3 research traditions that are used: variances of specific factors present, if you put in a lot of
Psychometric approach: examines elemental tests, you can eliminate that variance to measure g.
structure of a test; look at properties of a test through gf-gc theory of intelligence: there are 2 types of
intelligence: fluid (f; abilities that allow us to reason,
an evaluation of its correlates and underlying
dimensions. This is the oldest and Binet’s approach is think and acquire new knowledge) and crystal (c;
based on this approach mostly. knowledge and understanding that we have acquired).
Information-processing approach: look at processes Early Binet Scales
that underlie how we learn and solve problems The first version: 1905. It was an individual intelligence
test of 30 items presented in progressive difficulty. Idiot
Cognitive approach: focus on how humans adapt to
real-world demands. described those with the most severe form of
There’s a correlation between socioeconomic status and intelligence deficiency and they are those that didn’t go
scores on all standardized intelligence tests, thus past item 6 (follow directions and gesture); Imbecile
leading us to believe that intelligence tests are biased. was a bit more moderate in terms of impairment and
In 2904, a French minister wanted to make sure they didn’t go past item 8 (knowledge of body parts);
everyone is getting education that they can understand, Moron was the mildest level and the upper limit for
and so decided to separate the children based on their them was item 16 (stating difference between 2
intelligence level and performance on an intelligence common objects).The problems with this is that there
test. wasn’t enough validity to support it, the normalization
Binet’s Principles of Test Construction group was of only 50 children, and there were no units
Binet tried to measure the capacity to (1) find and to quantify the scales.
maintain a definite direction or purpose, (2) make In 1908, the test incorporated age differentiation. The
necessary adaptations (strategy adjustments), and (3) items were grouped according to age level, rather than
engage in self-criticism. The 3 human facilities he just difficulty (age scale).It was not that much better
focused on were mental, attentional and reasoning than the first because it was still limited in the number
facilities. He followed 2 principles: of skills of intelligence it tapped (it was heavily weighted
on language, reading and verbal skills, but had no motor
1.Age Differentiation: one can differentiate older
children from younger children by the former’s functioning and visual tests). Basically, this version had
greater capabilities. This was used to determine a the age scale format and the mental age notion.
child’s mental age regardless of their chronological Terman’s Stanford-Binet Intelligence Scale
age. Today, we instead use item response theory to The 3 version of the Binet scale only had minor
improvements, but it was being recognized in America
evaluate age equivalent capabilities.
2.General Mental Ability: total product various and Europe. The most used version is Terman’s 1916
separate and distinct elements of intelligence. version of the test:
Spearman’s model of General Mental Ability 1916 Scale: this version retained the age
Galton and Spearman also believed in that notion. differentiation, general mental ability, age scale and
mental age. The revision was that he increased the
Spearman theorized that intelligence consists of one standardization sample, but still had white California The Modern Binet Scale
children (so still not representative; 50 in 1905 and 203 Model for the Fourth and Fifth Edition: These versions
incorporate gf-gc theory of intelligence. The hierarchy: g
in 1908). This test also showed the intelligence
quotient (IQ; ratio of chronological age to mental age) is first, then 3 groups of crystallized intelligence (has 2
subcategories: verbal reasoning and nonverbal
to reflect rate of mental development), but it is now an reasoning), fluid-analytic intelligence, and short-term
outdated concept. The mental age is represented by memory (memory during short intervals). Thurstone
the score on the test. The problem is that the believes that intelligence can be conceptualized as
comprising several factors (primary mental abilities).
maximum score you can get on the scale is 19.5 (when Research later showed that there are such abilities, and
you pass all items), which means that people older form the great g.
The 1986 revision: it tried to strengthen the previous
than 19.5 will have IQs lower than 100. At the same
time, in 1916, people believed that you stopped tests’ pros, and eliminate their weaknesses. This version
eliminated age scales and instead placed all items with
mentally growing at 16, so 16 was the upper limit for the same content into one of the 15 tests (e.g. all vocab
the chronological age. items were placed in the same test). This version had 4
categories under g. So, the test can give you specific as
1937 Scale: this scale increased the age range, allowing
well as general scores, but this made the test limited on
2-year-olds to be tested, and increased the number of variability in terms of the items. This edition lost the
items to make it possible to test ages of 22 years and ability to tap the extremes in intelligence (but the 5th
edition got it back).
10 months. Scoring standards and instructions were
improved to decrease ambiguity and performance The 2003 revision: This version had 5 factors under g
(for 2003 revision), with each factor having equal
items were increased (to remove focus from language), components of verbal and nonverbal: Fluid reasoning,
but the items made up only 25% of the test, so they Knowledge, Quantitative Reasoning, Visual/Spatial
Reasoning and Working Memory. Each route has equal
weren’t equal yet. The standardization sample was
increased (to 3184 people) and became more levels of increasing difficulty across the items. Tasks of
differing content are grouped together on basis of
representative (from 11 states, but they were chosen difficulty. The 5thedition retains the advantage of point
scale by allowing examiners to summarize scores within
based on dad’s occupation; so they were still white).
The most important thing is the inclusion of an any given content area while also using a mixture of
tasks to maintain an examinee’s interest. Using the
alternate equivalent form (forms L and M). This made routing test ability, the examiner can skip the easy
it easier to measure psychometric properties. The question for efficient examination. Start point is the
estimated level of ability, basal is the level at which a
problem with the test though is that it had higher
reliability for older people, and higher reliability for minimum criterion number of correct responses is
obtained and ceiling is the level at which a certain
lower scores (under 70). The major problem though number of incorrect responses are reached (indicating
these are too difficult). The scores on the 5 nonverbal
was that IQs at different age levels were not equivalent
(there were different standard deviations for each scales have a mean of 10 and SD of 3. The age range
was also improved (2-85-year-olds can be tested). The
age). sample was enlarged to be more representative and
1960 Scale: this test took the top questions from both included those with ADHD, hearing/speaking problems,
gifted, and retarded. This increased the reliability for the
versions and created one test. The questions were
picked based on (1) increase in passing percentage 23 age ranges in the manual (0.97 or 0.98). Test-retest
coefficients are high (0.7-0.9), interscorer agreement
with increase in age and (2) correlated highly with was high, and average reliability for each test (IQ can be
scored looking only at verbal ability, nonverbal ability or
scores. The deviation IQ is a standard score with a
mean of 100 and SD of 16 (today it’s 15). This a full scale) was atho high. There isn’t a lot of research
support that the 5 edition reports on 5 factors, so
corrected for differences in variability with age levels, clinicians using this should be careful.
and allowed us to compare across age (this solved the Median Validity
Validity in these tests is supported by:
problem of differentiation).The sample size didn’t
change in that year but it changed in the same version Content validity
of 1972; there were 2100 children, 100 at each age Empirical item analysis
levels. The sample was also special because it included Considerable criterion-related evidence of validity.
Chapter 10 Wechsler Intelligence Test comprehension index (this test is important in almost
Many non-intellective factors (attitude, experience and every IQ test). It is the ability least affected by
emotional functioning) may play a role in a person’s deterioration of brain due to accidents (e.g.
ability to perform a task. Wechsler emphasized that
these factors are involved in intelligence. He created 3 schizophrenic people have this ability affected last).
tests: Wechsler Intelligence Scale for Children (WISC), Because of this, it can be used to determine one’s
Wechsler Adult Intelligence Scale (WAIS) and Wechsler
baseline (pre-morbid intelligence, which is the
Preschool and Primary Scale of Intelligence (WPPSI). intelligence ability of someone before they got into an
Wechsler believes that the Binet scale is not valid when
used for adults, and he also objected to the use of a accident).
single measure of intelligence. Basic changes: measuring o The Similarities Subtest: there are pairs of items
intelligence in adults, having separate scores for tests
and inclusion of performance scale. hensionreasing difficulty) and the individual must identify
Point and Performance Scale Concept the similarity in each pair. This is used to look at
2 of the most critical differences from Binet: thought process: e.g. schizophrenics usually give
Wechsler’s Use of point scale rather than age scale. idiosyncratic concepts (with meaning only to them),
Binet organized his by age group, meaning he
organized it by difficulty (2/3 of an age group would such as: “give me the similarity of bread and water,”
pass an item) and he also made the test such that the Verthe response would be “they’re both used for torture”.
individual needs to pass the test in order to get credit
o The Information Subtest: Usually easy for university
(if a test has 4 items, and the person only answered 2 students; it asks knowledge questions (like how many
correctly, they would get no credit at all). In Wechsler’s
point scale, credits/points were assigned to each item. people are in Congress) and looks at an individual’s
This also makes it easy to group items with similar range of knowledge, but can also be influenced by non-
content together (Binet had it disorganized but was
inspired to add this to his 1986 revision). intellective factors (curiosity, culture, interests…)
o The Arithmetic Subtest: contains 15 simple problems in
Wechsler’s inclusion of a nonverbal performance scale.
Later scores of the test included 4 scales (instead of increasing difficulty. These are really simple problems
just verbal and nonverbal). The concept of nonverbal that are used to examine your concentration,
testing isn’t new, but Wechsler’s idea was to also
compare scores across verbal and nonverbal tests motivation and memory. Only those deprived of
education or handicapped may struggle.
(what we do now). For a clinician, this nonverbal o The Digit Span Subtest: you repeat digits, given to you
component would also provide insight into the
patients’ behaviors. at a rate of 1/second, backwards and forward. It looks
at short-term auditory memory, but also non-
From Wechsler-Bellevue Intelligence Scale to WAIS-IV
The first version was poorly standardized. It had 1081 intellective factors can influence performance. This
white from NY (in 1939). This was improved and test would usually be impaired by anxiety.
changed into the WAIS in 1955.
Scales, Subtests and Indexes o The Letter-Number Sequencing Subtest: it is not
Wechsler defined intelligence like Binet: global capacity Workrequired to obtain an index score, but can be used for
of individual to act purposefully. He believed that additional information about a person’s intellectual
intelligence was made up of specific elements that are functioning. The subject is given a list of items, and are
interrelated. So, Wechsler made his tests to look at asked to reorder them (e.g. order Z, 3, B, 1, 2, A: 1,2, 3,
several abilities, which their sum would be the A, B, Z).
intelligence of the individual (research supports this; o Digit Symbol Coding Subtest: subject copies symbols
Binet didn’t think like that). The individual subtests and are timed for 120 seconds to see how many they
were made to measure a basic underlying skill. The can copy. This measures the ability to learn a new task,
subtests are part of 1 of 4 indexes (which measures a persistence and performance. Age may influence this.
broader ability): verbal comprehension, perceptual o Symbol Search subtest: subject is shown 2 geometric
reasoning, working memory and processing speed. The figures (targets) and is then required to find them
Full-Scale IQ (FSIQ) is the sum of the 4 indexes. among a group of 5 new figures. It’s a new test.
Subtest examples: o Block Design Subtest: 9 differently-colored blocks and
o The Vocabulary subtest: ability to define words; best a booklet. The subject is required to rearrange the
single measure and most stable. It is part of the verbal blocks like the booklet. It can look at cognitive 2 help in the interpretation of each other, and
impairment, abstract thinking and concept formation. individually, they can’t be used to diagnose
o The Matrix Reasoning Subtest: This looks at fluid retardedness (IQ below 70). One study looked at 5000
intelligence. The subject is presented with figures and gifted children who were Hispanic, African American,
is required to figure out the relation among them. Caucasian, or Filipino. Hispanics had no discrepancies
o The Comprehension Subtest: it has 3 types of between VIQ and PIQ; African American and Caucasian
questions: one that asks what an individual would do had higher VIQs; and Filipinos had higher PIQs. This
in a specific situation, the other asks subject to provide shows that we shouldn’t just generalize these either.
logical explanation for a rule/phenomenon, and the Pattern analysis: evaluation of large differences
third asks subject to define proverb. This reveals between subtest-scaled scores. Wechsler believed that
difficulties in emotion. For example, think of the emotionality affects scores, so if someone were a
question “what would you do if you find an injured schizophrenic, they’d display poor concentration, which
person on the street,” a psychopath might say that would show up as a low score in arithmetic subtests on
he/she didn’t do it, a germophobe might say to avoid the test; he believed his tests can be used for diagnosis.
getting blood on themselves, and a schizophrenic Research showed that results are inconclusive and
might say that they’ll run. contradictory.
Each subtest produces a raw score, with a different Case Study 1: Drop in Grades: an individual used to have
maximum total. For comparison purposes, the raw a B, but then suddenly, it became a stable D. Scores on
scores are standardized to a mean of 10 and SD of 3. WAIS show that he has above average scores in many
The subtest scaled for WAIS uses inferential norming verbal scales, but below average ones in performance.
(statistical method). This helped in defining reference- He has a high vocabulary score, which means that his IQ
group norms for the 13 age groups, and this helps in should also be above average, but according to the test,
comparing people across subtests. The subtest scores it was not. This could mean that there is a brain injury
then sum up to form the index score, which was also or tumor, but environmental factors can also lead to
standardized to a mean of 100 and SD of 15. such impairment. So, signs of mental illness
Index Scores: (schizophrenia, drug abuse…) should be ruled out
The verbal comprehension index is that of crystallized through interviews first, before looking for a tumor.
intelligence. This is purer than the Visual IQ because it Case Study 2: A Slow Learner: A student is identified as
excludes arithmetic and digit span tests (which would a slow learner and obtains below average scores on
also involve attention spans). VIQ, but 1 SD above average in PIQ, which means that
The perceptual reasoning index measures fluid she has potential. So, the VIQ might be lower than it
intelligence, and is influenced by attention and visual actually is because of motivation issues.
motor integration. Psychometric Properties
The Working Memory index looks at the information Standardization: Standardization group in WAIS-III is
that we currently have in our minds (not stored 2200 adults divided into 13 age groups (16-91), but the
knowledge). sample was stratified according to gender, race,
The processing speed index measures how quickly your education and location in 2005.
mind works. Reliability: It has high reliability coefficients, meaning
FSIQ: This is obtained by summing the age-corrected it’s good internally and temporally. The test-retest
scale scores of the 4 index composites. coefficient is slightly less than in the manual. Also, our
Interpretive Feature of the Wechsler SEM (standard error measurement) is high (2.16 for
WAIS is better than early Binet tests because it provides FSIQ) which means that we can be 95% confident that
a comparison of nonverbal intelligence with nonverbal whatever the score is, it’s the true score. E.g. if
IQ. The visual IQ (VIQ) and performance IQ (PIQ) was someone scored 110, 95% of the time, people would
the original scale for Wechsler. Even with only 2 score between 4.32 (2x SEM): between 105.68 and
different scales though, it’s still better than Binet. These 114.32. Validity: the test is considered among the most valid, w3hich child is asked to repeat sentences presented
but its correlations are based on the previous versions orally by examiner). It has the 5 factors of WISC, but
of the test. retained VIQ and PIQ. This also lowered the test age (to
Evaluation of Wechsler Adult Scales 2 years and 6 months) with a special test for 2-3 year
The pattern analysis is tricky because the reliability of olds. This test became compatible with other tests, as a
the individual subtests is lower, so use it with caution. requirement that children need to be tested more than
Downward Extensions of WAIS-III: WISC-IV and WPPSI once to determine level of educational need. 7 new
The WISC-IV was first published in 1949, revised in subtests were added. This test also includes updated
1974, 1991 and 2003 (most recent). WISC-IV measures norms and stratified. It also added 2 new composites:
intelligence in 6-16 year-olds (and 11 months); 100 boys PSQ (processing speed quotient) and General Language
and 100 girls at each age group. Many ideas of WAIS Composite (GLC). This is also age-specific and has
apply to the WPPSI (first in 1967, then 1989 and 2003). accommodations for children (can repeat missed
WISC-IV vs. WAIS-III: items…). Bias is evaluated empirically. This also has
WISC-IV has 15 subtests, 10 of which are new and 5 fewer language demands than its predecessor, and
completely new. The verbal comprehension index has many studies support its validity. The convergent
comprehension, similarities and vocabulary subtests. validity was well-established through comparisons with
The processing index has coding subtest and symbol other Wechsler tests.
search subtests. The working memory index has the
digit-span, letter-number sequencing and supplemental Chapter 13
arithmetic subtests. CPI: California Psychological Inventory; list of
The modern WISC updated the theoretical statements with T/F responses. It’s used as a career
underpinnings. To aid in clinical work, the test is linked assessment tool. It has 462 items.
to an achievement test (Wechsler Individual Personality characteristics: non-intellective aspects of
Achievement Test). human behavior, different from mental abilities
WISC-IV uses empirical data to identify item bias. (important for clinicians).
Before, they just used ‘experts’ to judge bias. Personality: stable and distinctive patterns of behavior
The standardization of WISC is like that of WAIS, and so that characterizes an individual and his/her actions to
are the scaled scores (calculated from raw scores); the environment; personality traits are the enduring
mean of 10 and SD of 3. The reliability is also just like dispositions; personality types are the general
WAIS. There is a lot of validity support in the manual of descriptions of people; personality states are the
WISC. There were no significant differences in scores emotional reactions that vary within situations.
across cultures, but were affected by affluence and Self-concept: is a person’s self-definition; the organized
education. A factor analysis found 5 factors for and relatively consistent set of assumptions that a
intelligence: verbal comprehension, constructional person has about themselves.
praxis, visual reasoning, freedom from distractibility, Binet had previously hypothesized that you can predict
and processing speed. This is further supported by the personality traits from pattern of intellectual
experiment that showed that the test has invariant (it functioning (with scientific support), but personality
equally predicted performance of both healthy and tests were developed after WW1.
clinical subjects). Only coding and comprehension The history: in WW1 they needed to have the best men.
subtests varied a little. The only way they knew how was through psychiatric
WPPSI-III vs. WAIS-III: evaluations, but there were too many men. Instead,
This scale looks at children from 4-6 years old. Only 2 they created self-report questionnaires (list of
unique subtests are included: animal pegs (optional test statements to which you respond with T/F). This is an
that is timed and asks child to place a colored cylinder in example of a structured (objective) test.
appropriate hole in front of the picture of an animal) Strategies of Structured Personality-Test Construction
and sentences (optional test of immediate recall in At the broadest level, strategies of creating such tests made to identify military recruits who would break
are deductive and empirical. Deductive ones comprise down in combat. The final version is of 116 items that
the logical-content and the theoretical approach; require a Yes/No response. It was basically a paper-
Empirical ones are the criterion-group and factor and-pencil psychiatric interview. The test produced a
analysis (there are combinations of these 2). single score (good for universality). If there were too
Deductive Strategies: use meaning and logic to many symptoms, an interview was then warranted.
determine meaning. There are 2 strategies within: This is a logical content test, but had 2 more features:
o Logical content Strategy: uses reason and (1) items endorsed by 25% of population were
deduction. The test designer logically deduces type excluded (thus reducing the number of false positives)
of content that should measure the characteristic and (2) only symptoms that occurred twice as much in
to be assessed. The problem with this is that it neurotics were included.
assumes the item truthfully describes the person. Early Multidimensional Logical-Content Scales: 2 of
o Theoretical Strategy: this begins with a theory the best-known early tests were Bell Adjustment
Inventory (BAI) and Bernreuter Personality Inventory.
about the nature of the particular characteristic,
and an attempt to deduce items is then made. The BAI looked at home, social and emotional functioning.
items must be consistent with the theory (e.g. if The BPI looked at 6 personality traits (to as young as
the theory says there are 6 personality types, the 13-year-olds). These were developed in 1930, and
test must touch on all 6). This strategy tries to were different than Woodworth in that they produced
create a homogenous scale, and so may use
statistical procedures (e.g. item analysis). Mooney Problem Checklist: few modern tests rely on
Empirical Strategies: rely on data collection and this exclusively, but Mooney Problem Checklist does.
statistics to determine meaning or the nature of It’s like the Woodworth test in that people with too
personality. This does retain some deductive many items checked mean that they have problems.
strategies, but they mostly use experimental research Criticisms: They are good in that they’re efficient, but
to determine. There are also 2 strategies here: the problem is the test also assumes subjects are always
o Criterion-Group Strategy: begin with a criterion completely honest. This is bad because sometimes
group (group of individuals who share a people lie, sometimes they don’t understand the
characteristic, e.g. schizophrenia or leadership). question, and sometimes they can’t objectively evaluate
Test constructors give the test to that group, and a themselves.
control group (representative of the entire Criterion-Group Strategy: Tests:
population). They then locate items that were Minnesota Multiphasic Personality Inventory (MMPI):
different amongst both. The problem is that the a T/F self-report questionnaire. It has validity scales to
content is of little consequence (face validity is provide information about person’s approach to
irrelevant). The next step would then be to cross- testing (fake bad or fake good), clinical scales to
validate scale by checking how well it distinguishes identify psychological disorders, and content scales
an independent criterion sample. A subject’s score (group of items that are empirically related to a
is converted to percentiles. The final step would be specific content area). The raw scores on content
to validate the scale more. scales are standardized to T scores with a mean of 50
o Factor Analytic Strategy: uses factor analysis to and SD of 10. The purpose of MMPI was like
derive basic dimensions of personality. It reduces Woodworth: distinguish normal from abnormal (now
data to a small number of descriptive units; uses used for psychiatric disorders). It requires at least a 6
correlation to indicate they belong to the same grade reading ability (MMPI-2 requires 8 grader), so
construct. administrators must make sure IQ is within the range.
Logical-Content Strategy Tests: o Clinical Scales: there were 1000 items from case
The Woodworth Personal Data Sheet is the first histories, psychology reports, books… but 504
personality test ever developed, during WW1. It was items were chosen because they seem to be independent of one another. The scales were then The final thing isn’t really a scale, but if a person
determined empirically by presenting items to doesn’t answer 10% or more of the items, their
criterion and control groups. The criterion group profile is considered invalid.
for the first ever MMPI were from Minnesota o The test was set at 50 mean and 10 SD. If a person
Hospital University, a total of 50 patients (they scored at 70 on MMPI, they were considered
were 800, but they needed agreement on significant (2 SDs above mean). For MMPI-2,
diagnosis). The control group was of visitors from they’re considered significant if it’s above 65. It
the hospital (which was the criticism, because it’s was also first believed that an individual with a
not a very representative sample). The criterion: characteristic of schizophrenia for example would
Hypochondriacs: preoccupied with body and show elevations only at that scale, but it is now
fears of illness and express somatic symptoms observed that elevations usually happen at more
Depressed patients: no interest, no appetite… than one scale. So, clinicians came up with pattern
Hysterics: physical problems with no physical analysis. This proved to be pointless (either this
cause (classified as immature individuals with made things so confusing, or just inaccurate).
over-dramatization). o Meehl’s Extension: Meehl suggested analyzing the
Psychopathic deviates: delinquent, criminal or 2 highest scales (2-point code). So, Meehl then
antisocial; manipulative and rebellious with no suggested that we don’t use named scales, but
anxiety or remorse numbers instead. So hypochondriasis is now known
Psychoasthenics: disorder of excessive doubts as scale 1, depression scale 2… but validity scales
and unreasonable fears, obsessive thoughts and were the same. The total number of items is 566
low energy (with repetitions) for MMPI and 567 items for
Schizophrenics: psychotic disorder with MMPI-2. T scores of 90+ were designated with *,
dramatic symptoms (hallucinations) and 80-89 with “, 70 and 79 with ‘, 60-69 with – and #
thinking issues (illogical), and are out of contact for scores of 20-29. This pattern is known as 13
with reality. codes. For example, 13*2”7’ 456890- means 1 and
Hypomanic: hyperactivity and irritability; poor 3 were above 90, 2 between 80 and 89, 7 between
impulse control and judgment. 70-79 and the rest are for 60-69.
2 Content scales were added: the MF (masculine- o Re-standardization: it was changed to update and
feminine scale; items differential to men and expand the norms, revise awkward, out-of-date,
women) and the Si scale (Social introversion; sexist, or problematic items, and broaden the item
measures introversion and extraversion). pool. They also wanted to develop a separate scale
o The validity scales: for adolescents. 16 repeated items were dropped,
L scale: Lie scale; to detect individuals who and 460 items from the old one were kept. The
attempt to present themselves in an overly interpretation stayed the same because everything
favorable way. was re-standardized (8% scored above 65 and 4%
K scale: similar to L scale, but is empirically above 70). The final sample of the control group
constructed. This was constructed by comparing was more educated and richer than the general
non-disturbed individuals to disturbed population because participation was completely
individuals who seemed to score as though they voluntary. Also, more validity scales were added,
were non-disturbed. like FB (back F; useful because it checked for
F scale: infrequency scale; detects individuals infrequency at the end too – instead of only at the
who attempt to fake good/bad. High F scores front as in MMPI), VRIN (Variable Response
mean the validity is questionable. One example: Inconsistency; evaluate random responding by
“odd odors come to me at times”. They’re providing 2 questions of the same content but are
questions that less than 10% of control group flipped) and TRIN (True Response Inconsistency
chose. Scale; measures acquiescence – tendency to agree regardless of content). MMPI-2 has 15 content counseling settings and evaluates normally adjusted
scales like HEA (health concerns), TPA (Type 1 individuals. There are 20 scales, divided into 4 classes:
Personality which is hard-driving and irritable), Class 1 for poise, self-assurance and effectiveness,
FAM (family problems) and WRK (work Class 2 for socialization, maturity and responsibility,
interference). Class 3 for achievement potential and intellectual
o Psychometric Properties: the factor structures of efficiency and Class 4 for interest modes (high means
new and old are similar. Median split-half reliability you adapt well with others). There are 13 scales that
for both is in the mid .70s. The test-retest are designed for special purposes (e.g. creativity,
reliability runs from 0.5-0.9, and is not as high as tough-mind, managing…). 1/3 of the 434 items are
the WAIS or Binet tests, but the higher order factor like those in MMPI. It also has similar reliability results
structures in both are extremely reliable (with as and coefficients to MMPI. The sampling was done
high as 0.90). There is a problem that items are based on friends’ ratings (so not very reliable). The
repeatable and exist on more than one scale. Scale good thing about it is that unlike MMPI, we can use
8 has the highest amount of items, but only 16 of those on normal individuals (in MMPI, we don’t really
them are unique. MMPI-2 didn’t fix this because it know what non-elevated scores mean). The future of
was focused on other things (i.e. maintaining CPI has good potential.
original scale) because it causes high Factor-Analytic Strategies
intercorrelations. Because of high intercorrelations Factor analysis is used to reduce the redundancy of test
among the scales and results of factor analytic items. For example, one technique is the principal-
studies, validity of pattern analysis has always components method that finds the minimum number of
been questioned. Response style: bias to mark an common factors that can account for most of the
item in a certain way regardless of content (e.g. variance. The thing is that it needs a computer (a lot of
acquiesce).This causes the problem if imbalance in arithmetic):
the way items are keyed. This is solved with VRIN Guilford’s Pioneer Efforts: They determined
and TRIN. Also, research emphasizes that interrelationship of a wide variety of tests and then
interpreters should take into account subjects’ factor analyzed them to find the main dimensions
demographics. Age, gender, intelligence, education underlying all personality tests. This started in 1940s.
and SES relate to the scales. Despite this, it was This created a single scale (Guilford-Zimmerman
shown that whites and African Americans can both Temperament Survey) and has 10 dimensions,
use it. Validity comes from the amount of research measured by 30 items each. These dimensions are:
conducted citing MMPI/MMPI-2 (14%, as opposed general activity, restraint, ascendance (leadership),
to 1% for Woodsworth and 2% for Mooney). There sociability, emotional stability, objectivity, friendliness,
is evidence that these tests can predict individuals thoughtfulness, personal relations and masculinity. The
who may become alcoholics (high on F, 4, and 9). survey is in the form of Yes/No responses. It has 3
Other things that can be detected: type of crime, verification keys. Nobody uses this test anymore.
female criminal risk factors, soldier emotional Cattell’s test: Cattell took all the adjectives that could
abuse, psychosis… describe humans, reduced them to 4504 real ones, and
o Current Status: The most serious drawback was the then again to 171. From that, he derived the 36 surface
inadequate control group and was eliminated in traits, but subsequent factor analysis resulted in a
MMPI-2. New clinical scales were introduced with further reduction to a final of 16 traits. These 16
modern norms. became the source traits and he developed the 16PF. 9
California Psychological Inventory (CPI): there are 36 sets of norms became available and there are 6 forms
scales in the 3 (most recent edition) and 3 of them available: 2 alternate forms for each language
were contrasted to produce (1) introversion- proficiency level. Unlike CPI and MMPI, there are no
extraversion, (2) conventional vs. not, and (3) self- overlaps in the items. The short-term test-retest
realization and sense of integration. It’s used in more reliability is high, but not the long-term. Also, the forms don’t correlate well with each other. Also, percentiles is questionable. Nobody is working on this
despite factor analyzing the 16, there was a lot of inter- test anymore.
correlation going on, so the test was further reduced Personality Research Form (PRF-3) and Jackson
into 4 traits, known as the second order. The test was Personality Inventory-Revised (JPI-R). Constructors of
made available in other countries. Adding 12 more these tests (unlike Edwards) developed specific
psychological disorder scales to the original 16 creates definitions of each need, making each item as
the Clinical Analysis Questionnaire (CAQ). MMPI seems independent as possible (which increases scores’
more useful the 16PF. homogeneity). Bi-serial correlational analysis located
A problem with this approach is that there is a items that correlated highest with proposed scale
subjective nature in naming factors. Each score on any (this process of steps is the latest trend in personality
given test can be broken down into common variance testing in psychology).To assess validity, a scale
(amount of variance a particular variable holds in similar to MMPI’s F scale was created (high scores
common with other variables), unique variance (factors means invalid results) and MMPI’s K scale. 2 sets of
uniquely measured by variable) and error variance.
parallel forms and 1 form based on the best items
Factor analysis identifies common but ignores unique from other forms were developed for PRF; JPI-R has
variance. one form of 300 T/F items for 15 scales. These 15 are
Theoretical Strategy organized in terms of 5 higher-order factors:
Predictions are made about the nature of the scale and analytical, extroverted, emotional, opportunistic and
if the predictions hold up, scale is supported:
dependable. PRF is intended primarily for research,
Edwards Personal Preference Schedule (EPPS): Best- but JPI is intended for normal individuals. Items for
known early example; it’s not actually a test because both tests are balanced for T/F keying. Although all of
there are no right or wrong answers; It’s used in this, they’re still not as powerful as MMPI (in use and
counseling and research. Murray proposed human in status).
needs are achievement, deference (conform), and Self-Concept: What you believe about yourself will
exhibition (need for attention). Edwards selected 15 affect your behavior strongly. Goughs Adjective
needs from that list and created items for them. Checklist contains 300 adjectives in alphabetical order
Edwards was concerned with social desirability (bias and the Piers-Harris Children’s concept scale, 2
to say good things about yourself). He solved this editi