CHAPTER 7: TESTS OF INTELLECTUAL
ABILITIES
Case study: Vera, 14 year old Caucasian female, suffered a TBI at the age of 4. Referred by her
school counsellor for a neuropsychological evaluation to obtain specific info regarding intellectual
abilities, the possibility of ADHD, any form of learning disability, or residual cognitive deficits due to a
TBI. Info from the evaluation would be compared to prior assessments to determine appropriate
placement in her academic setting. Vera stated that memory difficulties are the problems which cause
her the most difficulty. She indicated that she does will in her classes, but that they are too easy for
her. Also stated that she would like to be with the other kids, as she doesn’t want to be labeled
different. Behavioural observations from the interview with Vera gave the impression of a typical
teenager – didn’t exhibit any behavioural abnormalities or unusual physical mannerisms.
• Individual Educational Plan (IEP) a plan to address each of the stated needs of the student
with specific, concrete, goal-oriented programs The IEP must be evaluated on a regular basis
to determine if goals are being met.
• Public Law 94-142/ the Education for All Handicapped Children act of 1975: a law that
states that all children in the US are entitled to a free and public education in the least
restrictive environment, it has provisions for the assessment of all children with handicaps.
• Mainstream: the practice of bringing students out of the isolation of special schools and into
the mainstream of student life. Students in special education classes are integrated into the
general classroom.
• When attending a clinical assessment, each individual must complete a written interview form,
which contained various neurological symptoms and the ability to perform activities of a daily
learning: activities that any individual does on a daily basis like person hygiene, cooking and
meal planning, going to work or school, and leisure activities.
• Vera was assessed using the Wechsler Intelligence Scale for Children IV (WISC-IV) the
most current form of the Wechsler scales designed for use with children; it yields a full Scale
IQ and composite scores meaning Verbal, Comprehension, Perceptual Reasoning, Working
Memory and Processing Speed this was to garner an overall measure of intelligence and any
evidence of learning difficulties.
o She was administered an age-appropriate neuropsychological tests to determine any
residual deficits due to her TBI.
o She was also administered the Test of Variables of Attention (TOVA) a computerized
test designed to detect symptoms of ADHD
• Testing can’t be the only factor in making decisions about life choices, in Vera’s case, testing
was used but a decision was also made based on the best scenario for the client
PSYCHOMETRIC THEORY
• The function of Psychological tests has been to measure differences b/w the abilities of the
same individual under different circumstances
o In clinical neuropsych, the use of tests has been to assess the strengths and
weaknesses of a client after an accident or illness, develop a diagnosis, and/or
formulate a treatment plan for rehab
• The following principles apply regardless of the use of the particular test:
Test Construction
• In order to obtain a sample of behaviour the test must contain items, questions or tasks which
pertain to the behaviour in question
o Item choice is therefore related to the purpose of the assessment device and implies
that the purpose must be clearly articulate d • Test construction process begins with a domain aka type of behaviour that the researchers
desire to sample ex. Test of intelligence would have as its domain all of the behaviours
included within the researcher’s definition of intelligence
o Step 2: generation of test items to measure the concept.
o Step 3: give the test to multiple individuals
o Step 4: factor analysis is conducted Test items will be added and/or deleted as this
process progresses based on the usefulness of each item aka a factor analysis is
conducted
Factor analysis: a statistical procedure in which all of the scores are correlated
with one another to determine those variable or factors which account for the
most variance the data Basically determines the degree to which each item is
related to the construct, which the test purposes to measure
o Step 5: standardization uniformity of procedure in administering and scoring the test -
this is to rid the test of any inadvertent error which may be introduced by the examiner
or the conditions in which the test was administered
• The establishment of norms occurs in conjunction with standardization. Norms are conversion
of the raw scores of the sample group into percentiles in order to construct a normal
distribution to allow ranking future test takers
o Norms must include rationale for the choice of the sample
• An individual’s scores on a test have no intrinsic value but need to be evaluated and compared
to other individuals’ scores. Raw scores can be converted to various scaled scores for ease of
interpretation
o Ex. the standard score: an y score expressed in units of standard deviations of the
distribution of scores in the population, with the mean set at 0 in a neuropsych
evaluation, it can be derived from the WISC-IV
o Raw scores are mathematically converted to scaled scores with a mean or average
score of 100, and standard deviation
• The normal curve is the normal distribution with a mean of 100 and an SD of 15
o Most human traits from height and weight to aptitude and personality traits cluster in the
middle with few individuals further from the man
• The APA in conjunction with the American Educational Research Association and the National
Council on Measures in Education have formulated a clear guide entitled Standards and
Psychological Testing, which must be included when developing tests, as well as fairness in
testing and testing applications
Reliability
• The reliability of a test means its consistency – test reliability is the consistency of scores
obtained by the same individual when retested with the identical test or an equivalent form of
the test
o If the person received a diff score when taking the same test at a diff time, the test
would be useless
• Measures of test reliability make it possible to estimate what proportion of the total variance or
differences between test sources is error variance
o Variance: in statistics in a population of samples, the mean of the square of the
differences between the respective samples and their mean
o Error variance: any condition t that is irrelevant to the purpose of the test is error
variance; uniform testing conditions try to prevent error variance
Not due to errors in taking tests, but b/c the characteristics inherent within the
test
• Reliability of a test is represented as a correlation coefficient – a number that can range from
-1 to +1 o It measures the degree to which 2 variables are linearly related. Most correlations vary
b/w +1 and 0 with a reputable test having a reliability coefficient in the .80s
• Several types of reliability:
o Test-retest reliability: obtained by repeating the identical test on 2 occasions. The first
set of scores is correlated with the second set of scores. The reliability coefficient is the
correlation b/w the scores obtained by the same person on the two administrations
The error variance is caused by fluctuations of performance from Time 1 to Time
2
o Alternative-form reliability: obtained by having the same person take one form of a
test at Time 1 and an equivalent form of the test at Time 2. So the same people are
tested with one form of a test on one occasion and another, equivalent form on a
second occasion. The correlation b/w the scores represent the reliability coefficient
Parallel forms of the test must be evaluate to make sure they truly represent the
same domain
o Split-half reliability derived by dividing the test into equal portions and correlating the
two portions with one another.
There are many ways to split a test like odd numbered items vs. even numbered
items or the first half of the test vs. the second half of the test.
o Last method od determining reliability includes a single administration of a test and is
based on the consistency responses of all items in the test
Kuder & Richardson – procedure for finding interitem consistency. Difference
between the Kuder-Richardson coefficient and the split-half coefficient serves as
a rough index of the heterogeneity of the test
Validity
• Validity of a test concerns what the test measures and how well it measures that subject. All
procedures that determine validity are fundamentally addressing the question regarding
performance on a test and its relationship to some behavioural construct
• More critical in test construction than reliability
• Techniques to study validity including content validity, criterion validity, construct validity, and
the use of meta-analysis
• Standard approach is the correlation b/w the test score and the criterion measure
• Content validity: the systematic analysis of the actual test items to determine the adequacy of
the coverage of the behaviour being measured
o The extent to which the items of a test procedure are in fact a representative sample of
what is to be measure
o May suffer from subjective factors of bias in choice of items by the researcher
• Content validity is NOT the same as face validity in that face validity is NOT a correlation b/w
the test any other measure.
o Face valid tests are clearer in terms of the constructs to be measured which may lead a
test taker to not be as truthful in his/her answers if he/she so desires
o Ex. of a face valid test: Beck Depression Inventory-II. The question on the BDI-II clearly
assess depression and if the patient doesn’t desire to deal with this topic, the test may
not be an accurate representation of his current mental states
• Cirerion Validity: the effectiveness of the test in predicting behavioural criteria on which
psychologists agree aka in predicting extra-test performance of behaviour
o Intelligence tests are often validated against a measure of academic achievement such
as GPA
Ex; Vera’s (case study) intelligence test scores were evaluated in comparison to
her performance in special ed. classes. CNS disease/difficulty – test scores may be evaluated against job performance or used to predict performance in job or
school
• Construct validity: the extent to which the test is able to measure a theoretical construct or
trait. Ex. processing speed, attention, or concept formation
o Each construct is developed to explain and organize observable response consistencies
• Internal consistency: a measure based on the correlations b/w different items on the same test
(or the same subscale on a larger test) it measures whether several items that propose to
measure the same general construct produce similar scores
o Campbell (1960): in order to show construct validity a test must demonstrate that the
test correlates highly with other veriabel with which it theoretically should convergent
validation
o Discriminant validation the test should NOT correlate with variables with which it
should differ
o Multi-trait – multimethod matrix Campbell and Fiske: systematic experimental
design for the direct evaluation of convergent and discriminant validation . The
procedure asses two or more traits by two or more methods
• Meta-analysis: used since the 90s but being used more often currently as a substitute for the
traditional literature search by test users and devlopers b/c of the large database available
research on current tests
o It’s the evaluation of multiple studies using factor analytic methodology
o Meta analysis cans reveal sonme susbttinal positive findings
o effect sizes are taken into account – the degree of statistical significance in the data or
the size of the correlation
ETHICS IN TESTING AND ASSESMENT
• 3 documents that specify proper professional use of testing material and proper
construction/development of testing materials:
o The APA’s Ethical Principle of Psychologists and Code od Conduct
Include specific principles related to competence privacy and confidentiality, and
assessment
these statements elaborate on 2 basic issues: technical standards for the
development of tests and standards for the professional use of tests
o APA + American Educational Research Association + National Council on Measurement
in Education Standards for Educational and Psychological Testing (1999)
Appropriate technical and professional standards to be followed in construction,
evaluation, interpretation, and application of psychological tests
o International Testing Committee international Guidelines for Test Use (2000)
Overlaps with prior 2 to a great extent.
Main parts include the need for high standards when developing and using tests
and the need for professional behaviour when administering, interpreting and
reporting results of testing material
HISTORY IN INTELLECTUAL ASSESSMENT
• Earliest accounts of the use of assessments are from the Chinese Empire – they used a form
of assessment for civil service examinations
• Greeks used testing within their educational System – testing physical and intellectual skills
• World events that had major impact on testing included the change in treatment of individuals
with mental disorders
o Mental hygiene movement led to individuals with mental disorders being separated from
criminals in separate facilities Kraepline – development of diagnostic classification system
o Need for school systems always been the issue regarding the ability levels of all
students. Testing
• Jean Equirol first to make a distinction b/w individuals based on testing information. In a two
volume text, he spent more than 100 pages pointing out t
More
Less