Chapter 1: Introduction Basic Concepts What a Test Is • Atest is a measurement device or technique used to quantify behavior or aid in the understanding and prediction of behavior • An item is a specific stimuli to which a person responds overtly; this response can be scored or evaluated • Psychological and educational tests are made up of items • Items are the specific questions or problems that make up a test • Apsychological test or educational test is a set of items that are designed to measure characteristics of human beings that pertain to behavior • There are many types of behavior • Overt behavior is an individual’s observable activity • Covert behavior takes place within an individual and cannot be directly observed • Psychological and educational tests measure past or current behavior, and attempt to predict future behavior • Scales: relate raw scores on a test items to some defined theoretical or empirical distribution • Scores on tests may be related to traits, which are enduring characteristics or tendencies to respond in a certain manner • Test scores may also be related to the state, or the specific condition or status, of an individual Types of Tests • Individual tests: those that can be given to only one person at a time • Test administrator: (or examiner) is the person giving the test • Group test: can be administered to more than one person at a time by a single examiner • One can also categorize tests according to the type of behavior they measure • Ability tests contain items that can be scored in terms of speed, accuracy, or both – the faster or the more accurate your responses, the better you scores on a particular characteristic • Ex. The more algebra problems you can correctly solve in a given time, the higher you score in ability to solve such problems • Different types of ability include, achievement, aptitude, and intelligence • Achievement: refers to previous learning • Aptitude: refers to the potential for learning or acquiring a specific skill • Intelligence: refers to a person’s general potential to solve problems, adapt to changing circumstances, think abstractly, and profit from experience • The distinctions among achievement, aptitude, and intelligence are not always so cut- and-dried because all three are highly interrelated • In view of the considerable overlap of achievement, aptitude, and intelligence tests, all three concepts are encompassed by the term human ability • Personality tests: measure typical behavior – overt and covert traits, temperaments, and dispositions. • There are several types of personality tests • Structured personality tests: provide a self-report statement to which the person chooses between two or more alternate responses such as, “true” or “false”, “yes” or “no” • Projective personality tests: provides an uncertain test stimulus (incentive); response requirements are unclear. Projective tests are unstructured Psychological testing: refers to all the possible uses, applications, and underlying concepts of psychological and educational tests. The main use of these tests, though, is to evaluate individual differences or variations among individuals Overview of the Book Principles of Psychological Testing The basic concepts and fundamental ideas that underlie all psychological and educational tests Reliability: refers to the accuracy, dependability, consistency, or repeatability of test results (consistent results) Validity: refers to the meaning and usefulness of test results (generalizable) Test administration: the act of giving a test Applications of Psychological Testing Interview: a method of gathering information through verbal interaction, such as direct questions Issues of Psychological Testing Many social and theoretical issues accompany testing Test bias Historical Perspective EarlyAntecedents Most of the major developments in testing have occurred over the last century, many of them in the US The Chinese had a relatively sophisticated civil service testing program more than 4000 years ago By the Han Dynasty (206 BCE to 220 CE), the use of test batteries (two or more tests used in combination) was quite common The Western world most likely learned about testing programs through the Chinese After the British (1855) endorsement of a civil service testing system, the French and German governments followed suit In 1883, the US government established theAmerican Civil Service Commission, which developed and administered competitive examinations from certain government jobs Charles Darwin and Individual Differences Perhaps the most basic concept underlying psychological and educational testing pertains to individual differences No two people are exactly alike in ability and typical behavior Tests are specifically designed to measure these individual differences in ability and personality among people Understanding individual differences came with the publication of Charles Darwin’s book, The Origin of Species, in 1859 – his theory was that higher forms of life evolved partially because of differences among individual forms of life within a species Sir Francis Galton, and his book Hereditary Genius, showed the concepts of the survival of the fittest and individual differences. He concentrated on demonstrating that individual differences exist in human sensory and motor functioning, such as reaction time and physical strength James McKeen Cattell, coined the term mental test, based his work on Galton’s work, on individual differences in reaction time Experimental Psychology and Psychophysical Measurement Asecond major foundation of testing can be found in experimental psychology and early attempts to unlock the mysteries of human consciousness through scientific method Wilhelm Wundt is credited with founding the science of psychology Thus, psychological testing developed from at least two lines of inquiry: one based on work of Darwin, Galton, and Cattell on the measurement of individual differences, and the other (more theoretically relevant and probably stronger) based on the work of the German Psychophysicists Herbart, Weber, Fechner, and Wundt. Experimental psychology developed from the latter The Seguin Form Board Test was developed in an effort to educate and evaluate the mentally disabled Kraepelin devised a series of examinations for evaluating emotionally impaired people Alfred Binet developed the first major general intelligence test The Evolution of Intelligence and StandardizedAchievement Tests Binot-Simon Scale – first intelligence test – 1905 Standard conditions were there to compare the results from any new subject Further development of the test involved attempts to increase the size and representativeness of the standardized sample Representative sample: is one that comprises individuals similar to those for whom the test is to be used. Arepresentative sample must reflect all segments of the population Mental age: a measurement of a child’s performance on the test relative to other children of that particular age group (1908 this concept was determined) If a child’s test performance equals that of the average 8 year old, then his or her mental age is 8. World War I Two structured group tests of human abilities: theArmyAlpha and theArmy Beta. The ArmyAlpha required reading ability, whereas theArmy Beta measured the intelligence of illiterate adults Achievement Tests Standardized achievement tests - broad coverage, less expensive and more efficient than written tests Objectivity and reliability made them superior Rising to the Challenge The Wechsler-Bellevue Intelligence scale (W-B) yielded several scores, permitting an analysis of an individual’s pattern or combination of abilities Personality Tests: 1920-1940 Woodworth Personal Data Sheet: an early structured personality test that assumed that a test response can be taken at face value The Rorschach Inkblot Test: a highly controversial projective test that provided an ambiguous stimulus (an inkblot) and asked the subject what it might be The ThematicApperception Test (TAT): a projective test that provided ambiguous pictures and asked subjects to make up a story The Minnesota Multiphasic Personality Inventory (MMPI): a structured personality test that made no assumptions about the meaning of a test response. Such meaning was to be determined by empirical research The California Psychological Inventory (CPI): a structured personality test developed according to the same principles as the MMPI The Sixteen Personality Factor Questionnaire (16PF): a structured personality test based on the statistical procedure of factor analysis Factor analysis: a method of finding the minimum number of dimensions (characteristics, attributes), called factors, to account for a large number of variables The Period of Rapid Changes in the Status of Testing 1940s-1970s The Current Environment Neuropsychologists use tests in hospitals and other clinical settings to assess brain injuries Health psychologists use tests and surveys in a variety of medical settings Forensic psychologists use tests in the legal system to assess mental state as it relates to an insanity defense, competency to stand trial or to be executed, or emotional damages Child psychologists use tests to assess childhood disorders Testing is indeed one of the essential elements of psychology Chapter 2: Norms and Basic Statistics for Testing Why We Need Statistics Statistical methods serve two important purposes in the quest for scientific understanding: Statistics are used for purposes of description We can use statistics to make inferences, which are logical deductions about events that cannot be observed directly Exploratory data analysis: detective work of gathering and displaying clues Confirmatory data analysis: the clues are evaluated against rigid statistical rules Descriptive statistics: methods used to provide a concise description of a collection of quantitative data Inferential statistics: methods used to make inferences from observations of a small group of people know as a sample to a larger group of individuals known as a population Scales of Measurement Measurement is the application of rules for assigning numbers to objects Properties of Scales Three important properties make scales of measurement different from one another: magnitude, equal intervals, and an absolute 0 Magnitude The property of “moreness” Ascale has the property of magnitude if we can say that a particular instance of the attribute represents more, less, or equal amounts of the given quantity than does another instance Ex. If we can say that john is taller than Fred, then the scale has the property of magnitude Equal intervals If the difference between two points at any place on the scale has the same meaning as the difference between two other points that differ by the same number of scale units Apsychological test rarely has the property of equal intervals When a scale has the property of equal intervals, the relationship between the measured units and some outcome can be described by a straight line or a linear equation in the form Y = a + bX Absolute 0 An absolute 0 is obtained when nothing of the property being measured exists Types of Scales Nominal scale: there is no order within the data, qualitative variables are placed into categorical groups Ordinal scale: this scale allows you to rank individuals or objects but not to say anything about the meaning of the differences between the ranks Interval scale: when a scale has the properties of magnitude and equal intervals but not an absolute 0 (Ex. Temperate in degrees Fahrenheit) Ratio scale: a scale that has all three properties (magnitude, equal intervals, and absolute 0) (Ex. Kelvin scale) Permissible Operations Nominal data can be placed in only one mutually exclusive category (for example you are a member of only one gender) You can use nominal data to create frequency distributions but no mathematical manipulations Ordinal measurements can be manipulated using arithmetic; however the result is often difficult to interpret With interval data one can apply any arithmetic operation to the differences between scores Frequency Distributions Adistribution of score summarizes the scores for a group of individuals The frequency distribution displays scores on a variable or a measure to reflect how frequently each value was obtained Scores are arranged on the vertical axis and the frequency of the value is displayed on the horizontal axis Bell-shaped distributions have the greatest frequency of scores toward the center of the distribution and decreasing scores as the values become greater or less than the value in the center of the distribution Apositive skew – the tail goes off toward the higher or positive side of the X axis (and opposite for a negative skew) Class interval: the width of the class interval is the units on the horizontal axis Percentile Ranks Percentile rank answers the questions, “what percent of the scores fall below a particular score (Xi)?” To calculate a percentile rank, you need to follow these steps (1) determine how many cases fall below the score of interest, (2) determine how many cases are in the group (3) divide the number of cases below the score of interest by the number of cases in the group and multiple the result by 100 Percentiles Percentiles are the specific scores or points within a distribution They indicate a particular score, below which a defined percentage of scores falls The percentile gives the point in a distribution below to which a specified percentage of cases fall, the percentile rank gives the percentage of cases below the percentile The percentile is raw data Describing Distributions Mean Avariable is a score that can have different values The arithmetic average score in a distribution is called the mean Standard Deviation The standard deviation is an approximation of the average deviation around the mean Variability is the difference in scores in a set The sum of the deviations around the mean will always equal 0 Variance: squaring all the deviations around the mean, adding them up, and dividing by N Standard deviation: the square root of the average squared deviation around the mean When we talk about a sample we divide by N-1 Z Score The Z score transforms data into standardized units that are easier to interpret AZ score is the difference between a score and the mean, and then divided by he standard deviation AZ score is the deviation of a score from the mean in standard deviation units If a score is equal to the mean, then its Z score is 0 Standard Normal Distribution Also known as a symmetrical binomial probability distribution Z scores have a mean of 0 and a standard deviation of 1.0 th If a score is 1.0 standard deviation above the mean then it is at the 84 percentile (50 + 34.13 = 84.13) Percentile and Z Scores These percentile ranks are the percentage of scores that fall below the observed Z score McCall’s T The mean is 50 rather than 0 and the standard deviation is 10 rather than 1 AZ score can be transformed to a T Score by applying the linear transformation T = 10Z + 50 Transformations standardize but do not normalize Quartiles and Deciles Divisions of the percentile scale in groups The quartile system divides the percentage scale into four groups, whereas the decile system divides the scale into 10 groups Quartiles: points that divide the frequency distribution into equal fourths The first quartile is the 25 percentile, the second quartile is the median, or 50 th percentile, and the third quartile is the 75 percentile Q1, Q2, Q3 Interquartile Range = Q3 – Q1 The middle 50% of the data Deciles: are similar to quartiles except that they use points that mark 10% rather than 25% intervals Thus, the top decile, or D9, is the point below which 90% of the cases fall Stanine System: this system converts any set of scores into a transformed scale, which ranges from 1 to 9 The term stanine comes from “standard nine” The scale is standardized to have a mean of 5 and a standard deviation of approx. 2 Norms Norms: refer to the performance by defined groups on particular tests The mean is a norm, and the 50 percentile is a norm Norms are used to give information about performance relative to what has been observed in a standardized sample Age-Related Norms Certain tests have different normative groups for particular age groups Tracking One of the most common uses of age-related norms is for growth charts used by pediatricians Ex. Is my son short or tall? The comparison is usually with people of the same age Pediatricians must know more than a child’s age; they must also now the child’s percentile within a given age group Children tend to stay at about their same percentile level, relative to other children in their age group, as they grow older This tendency to stay at about the same level relative to one’s peers is know as tracking Criterion-Referenced Tests The purpose of establishing norms for a test is to determine how a test taker compares with others Anorm-reference test compares each person with a norm These tests do not compare students with one another; they compare each student’s performance with a criterion or an expected level of performance Acriterion-referenced test describes the specific types of skills, tasks, or knowledge that t
