Chapter 19

Test Bias Chapter 19 – Test and measurement Why is test bias Controversial?  A big problem is that many different ethnic groups score lower averages on some tests  Shown between and whites and African Americans- they tend to score one standard deviation below the average white score  Dispute is over why these difference exist- not if they exist  Some say that it is environmental factors, while others say that it is biological factors  Now when people are taking these intelligence tests, they are not disclosing their race  Studies have shown that students who do not disclose their race get lower grades on the tests  Whites believe that there majority status may put them at a disadvantage and African Americans believe that they wont do as well If they do put their race Test Fairness and the Law  1964 civil rights act- title VII created the equal employment opportunity commission, and in 1978 released guidelines for the use of psychological tests in education and industry  Made clear that the government will view any screening procedure, including the use of psychological tests, as having an adverse impact, if it systematically reject substantially higher proportions of minorities than nonminority applicants o Adverse impact: the effect of any tests used for selection purposes if it systematically rejects substantially higher proportions of minority than majority job applications  The employer must prove the validity of the test in this case  The office of Federal Contract Compliance has the direct power to cancel government contracts held by employers who do not comply with these guidelines The traditional Defense of Testing  Differential Validity: the extent to which a test has different meanings for different groups of people. For example a test may be a valid predictors of college success for whites, but not African Americans  ^ Very controversial and emotional  Just because there is a difference in how different ethnic groups do on a test does not meant hat there is test bias  The question is if there are different meanings for different groups  Difference between Hispanics and whites are not as different as those between African American and whites Content Relates Evidence for Validity  Stanford- Binet scale was criticized because they said that some of the children came for disadvantaged backgrounds and answered the questions differently, but still correctly  Problems also arise with the language- If a child does not know the language, they have no chance of doing well on a standardized IQ test  Many people feel that a test is fair if there are questions on it that they can answer  Test developers are indifferent about the opportunities that people have to learn the information on the tests  Evidence suggests that linguistic bias does not cause the difference o Students given a test, some in African American dialect, some in standard, this produced less than a one point increased on these test, African Americans can understand both, not seen for whites, they need standard  One approach- experts removed potentially biased questions- this showed no improvement in test scores  Another approach is to find those classes of items that are likely to be missed by members of a particular group  If you can find and eliminate these items then they can be removed from future testing Differential item functioning (DIF) analysis  Developed by the educational testing service  DIF analysis try’s to identify items that are biased to a particular group o First equates groups on the basis of overall scores o Using the groups it evaluates differences in performance between men and women on particular items o If the items differs significantly then it is thrown out  Example: 27 items took off SAT only small difference because they were the easiest items  Studies have failed to demonstrate serious bias in item content  Studies have used the DIF to help understand the experiences of discrimination in the community  How do biased test items affect the differential validity of a test? o Example: 25% of items on a test were seen as so biased that minority test takers were expected to perform at just chance level o There would only be slight and maybe undetectable differences in validity coefficients for minority and majority group members o It was suggested that failure to find differences in validity co-efficient is consistent with the belief that the test are equally valid for member of different ethnic and racial groups  Summary: studies have not supported items have different meanings for different groups Criterion related sources of bias  Isodensity curve: an ellipse in a scatterplot (or a two dimensional scatter diagram) that encircles a specific proportion of the cases constituting particular groups  If a test (say the SAT) has a high correlation between good test score and good first year grades then you can assume that it has high validity  REGRESSION LINE GRAPHS  The best way to show SAT scores for different groups is two lines not one. Because with one line it over-predicts scores for one group and under-predicts scores for another Other approaches to Testing Minority Groups Ignorance versus stupidity  Ignorance – one side of the trial said that the kids were not taught what they needed to know for the test  Stupidity- one side said that it tests underlying intelligence and that the smarter people do better  If ignorance- not a concern because this can be easily changed  Stupidity would mean that minority’s have a deficit that can not be changed, and this would be more damaging The Chitling test  The thought was that there are stupid and smart animals because of how they are like humans- But in fact they are all equally intelligent in their own environment because that is what they need to survive  Thought this was with minorities because they grow up differently and the are intelligent with the skills that they needed  Developed to shows that there is a body of information
