Psychological Assessment - Lecture 5; Feb 5, 2013 Validity and Test Development Validity: A Definition  A test is valid to the extent that inferences made from it are appropriate, meaningful, and useful - us trying to get evidence as researchers to see if the test measures what it is really supposed to measuring  A unitary concept is determined by the extent to which a test measures what it purports to measure Validity as a Developmental Process  Begins with test construction and continues indefinitely - designing the test  Test validity hinges upon the accumulation of research findings  For example, transition from the third to the fourth edition of the Wechsler Adult Intelligence Scale (WAIS) Content Validity  Determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample - what should we be testing? 1 Psychological Assessment - Lecture 5; Feb 5, 2013 Content Validity Coefficient  Content validity = D/(A + B + C + D)  Divide the number of times people agree by all other possibilities  For Figure 4.3: Content validity = 87 / (4 + 4 + 5+ 87) = 0.87 Face Validity  A test has face validity if it looks valid to test users, examiners, and especially the examinees - sometimes you don't want it to have face validity because you might not get at what you want if the participant knows and the participant can lie Criterion - Related Validity  Criterion-related validity is demonstrated when a test is shown to be effective in estimating an examinee’s performance on some outcome measure - we know that the test has this validity if it associates with the real world  The variable of primary interest is the outcome measure, called a criterion  Characteristics of a good criterion: (a) reliable and (b) free of contamination from test itself - criterion cannot be contaminated by the thing you want to measure 2 Psychological Assessment - Lecture 5; Feb 5, 2013 Types of Criterion - Related Validity  Concurrent validity  Test scores and criterion information are obtained simultaneously; ex: patient actually calling 911 vs. getting patient to call 911 on a fake phone  Predictive validity  Test scores are used to estimate outcomes to be measured at a later date - not measuring the other thing at the same time; ex: when someone is first complaining and test them and then survey the family members 10 years later What about the Graduate Record Examination (GRE)?  Predictive validity is actually very weak in terms of success in graduate studies in psychology  Validity coefficients range from .30 to .45 between the GRE and both first year and overall graduate GPA in a study conducted by Educational Testing Services  GRE is not predictive of success in graduate school Decision Theory  One purpose of psychological testing is measurement in the service of decision making - there is going to be error in decision, never perfect  False positive (e.g., persons predicted to succeed actually fail) ; someone's going to pass but actually fail  False negative (e.g., persons predicted to fail actually succeed); someone's going to fail when they actually succeeded - bad decision Examples of "False" Decisions  Airport Security: a “false positive” is when ordinary items (e.g., keys or coins) get mistaken for weapons  Quality Control: a “false positive” is when a good quality item gets rejected, and a “false negative” is when a poor quality item gets accepted  Antivirus software: a “false positive” is when a safe file is thought to be a virus  Medical screening: low-cost tests given to a large group can give many false positives (i.e., saying you have a disease when you actually do not), and then ask you to get more accurate tests how might a false positive diagnosis of Alzheimer's disease be made based on an older adults' scores on the Independent Living Scales? if the person gets a low score and say that they have a loss of independence when they actually do not. 3 Psychological Assessment - Lecture 5; Feb 5, 2013 Construct Validity  A construct is a theoretical, intangible quality or trait in which individuals differ - we assume people differ in intelligence, we need to have a way of testing to see if the test measures what its supposed to be measuring - people have this thing that we're trying to measure and the way to measure is it is with the test  Examples include: leadership ability, over-controlled hostility, depression, and intelligence - assume people have these things so our test is designed to measure this  Can be considered the unifyin
