Chapter 4Validity and Test DevelopmentNote Reliability is a necessary but not a sufficient precursor of validity Test developers have a responsibility to demonstrate that new instruments fulfill the purposes for which they are designedValidity A definition Validitya test is valid to the extent that inferences made from it are appropriate meaningful and useful A test score is per se meaningless until the examiner draw inferences from it based on the test manual or other research findings Unfortunately is actually very seldom possible to summarize the validity of a test in terms of a single tiny statistic o Determining whether inferences are appropriate meaningful and useful typically requires numerous studies of the relationships between test performances and independently observed behavioursValidity reflects an evolutionary researchbased judgement of how adequately a test measures the attribute it was designed to measure Traditionally the different ways of accumulating validity evidence have been grouped into categories o Content validityo Criterionrelated validityo Construct validityContent validity Content validity is determined by the degree to which the questions tasks or items on a test are representative of the universe of behaviour the test was designed to sampleo Nothing more than a sampling issueo The items of a test can be visualized as a sample drawn from a larger population of potential items that define what the researcher really wishes to measure If the sample specific items on the test is representative of the population all possible items then the test possess content validity Content validity is a useful concept when a great deal is known about the variable that the researcher wishes to measure When evaluating content validity response specification is also an integral part of defining the relevant universe of behaviourso For example in reference to spelling achievement it cannot be assumed that a multiple choice test will measure the same spelling skills as an oral test or a frequency count of misspellings in written compositions Content validity is more difficult to assure when the test measures an ill defined traitQuantification of Content ValidityA coefficient of content validity can be derived from the following formulao Content validityD ABCDThe commonsense approach to content validity advocated here serves well as a flagging mechanism to help cull out existing items that are deemed inappropriate by expert raters A test could possess a robust coefficient of content validity and still fall short in subtle waysFace Validity A test has face validity if it looks valid to test users examiners and especially the examinees Is really a matter of social acceptability and not a technical form of validity in the same category as content criterion related or construct validity In fact a test could possess extremely strong face validity the items might look highly relevant ro what is presumably measured by the instrument yet produce totally meaningless scores with no predictive utility whatever Criterion Related Validity Criterion related validity is demonstrated when a test is shown to be effective in estimating an examinees performance on some outcome measure The variable of primary interest is the outcome measure called a criterion Two different approaches to validity evidence are subsumed under the heading of criterion related validity o In concurrent validity the criterion measures are obtained at approximately the same time as the test scoresFor example the current psychiatric diagnosis of patients would be an appropriate criterion measure to provide evidence for paper and pencil psychodiagnositc testo in predictive validity the criterion measure are obtained in the future usually months or years after the test scores are obtained as with the college grades predicted from an entrance exam Characteristics of a Good Criterion The criterion must itself be reliable if it is to be a useful index of what the test measures An unreliable criterion will be inherently unpredictable regardless of the merits of the test The extent that the reliability of the other test or the criterion or both is low the validity coefficient is also diminishedo Validity coefficient is the resulting correlation coefficient
