Chapter 5

PSYC 3090 Chapter 5: CHAPTER 5

CHAPTER 5 – RELIABILITY -consistency in measurement, not necessarily good or bad just consistent Reliability coefficient: index of reliability, a proportion that indicates the ratio between he true score variance on a test and the total variance Concept of Reliability X = observed score T= true score E = error score X = T + E Variance: the standard deviation squared True variance: variance from true differences Error variance: variance from irrelevant, random sources Reliability: the proportion of the total variance attributed to true variance -greater this proportion, more reliable the test Measurement error: collectively all of the factors associated with the process of measuring some variable, other than the variable being measured Random error: source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process -this error fluctuates from one testing situation to another with no pattern that would systematically raise or lower scores -could be unanticipated events in test environment Systematic error: refers to a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured -predictable + fixable Sources of Error Variance Test Construction Item sampling/content sampling: terms that refer to variation among items within a test as well as to variation among items between tests -develop test that maximizes the proportion of total variance that is true variance and minimize error variance Test Administration -influence attention or motivation -test environment, temperature, lighting etcs -writing surface, bad writing utencils etc -test taker variables, emotional problems, physical discomfort etc -misread a test item, accidentally bubbling things -examiner related variables, physical appearance and demeanor, unwittingly provide clues, etc Test Scoring and Interpretation -bubble sheets scored by computers eliminated most error, not all tests can be like this -examiners determine how things are scored -scorers + scoring systems can be source of error, technical glitch, if subjectivity involved person can be error -even with criteria, examiners can have situations that are in a gray area Other Sources of Error -sampling error, extent to which population was representative -maybe not include enough people in sample -methodological error, interviewers not trained properly, wording ambiguous, etc -certain types of assessment lend to varieties of systematic and nonsystematic error (ex assessing agreement between partners regarding physical abuse in the relationship) -amount of test variance that is true relative to error may never be known Reliability Estimates Test-Retest Reliability Estimates Test-retest reliability: estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test -used when expecting same score each time (things that wouldn’t fluctuate in a person) Coefficient of stability: estimate of test-retest reliability -evaluation of test-retest reliability estimate must extend to a consideration of possible intervening factors between test administrations Parallel-Forms and Alternate-Forms Reliability Estimates Coefficient of equivalence: the degree of the relationship between various forms of a test can be evaluated by means of alternate-forms or parallel-forms coefficient of reliability Parallel forms: each form of the test, the means and variances of observed test scores are equal -in theory, means correlate equally with true score Parallel forms reliability: estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observe test scores are equal Alternate forms: different versions of a test that have been Alternate forms reliability: estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error -obtaining estimates of these reliability’s is similar in 2 ways 1. Two test administrations with the same group are required 2. Test scores may be affected by factors such as motivation, fatigue, or intervening events such as practice, learning or therapy -item sampling inherent in the computation of alternate or parallel forms reliability coefficient -test takers may do better or worse on specific form of test not as function of true ability but simply because of particular items on that test -developing alternate forms can be time-consuming + expensive -can find reliability without developing alternate test Internal consistency estimate of reliability/estimate of inter-item consistency: estimate of the reliability of a test obtained from a measure of inter-item consistency -one method is split-half reliability Split-Half Reliability Estimates Split-half reliability: obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once 1. Divide test equally in half 2. Calculate pearson r between scores on two halves 3. Adjust the half-test reliability using spearman brown formula -never in the middle -randomly assign items to one or other half of test is fine -could do odd or even numbers, no correct way (odd-even reliability) Spearman-brown formula Spearman-brown formula: allows test developer or user to estimate internal consistency reliability from a correlation of two halves of a test -usually reliability increases as test length increases -if want to shorten test, this formula can measure -can be used to find number of items needed to attain desired level of reliability Other Methods if Estimating Internal Consistency Inter-item consistency: the degree of correlation among all the items on a scale -calculated from single administration of a single form of a test Homogeneity: if they contain items that measure a single trait Heterogeneity: degree to which test measures different factors -more homo it is more inter-item consistency it can be expected to have -but not good for multifaceted psychological variables like intelligence or personality Kuder-richardson formulas Kuder-richardson formula: series of equations designed to estimate inter-item consistency of tests -good for homogeneity tests and dichotomous items -KR-21 can be used to find approximation of KR-30, and if reason to assume all test items have same degree of difficulty Coefficient alpha Coefficient alpha: the mean of all possible split-half correlations -appropriate for non-dichotomous items -preferred statistic for obtaining estimate of internal consistency reliability -scale of 0-1 Average proportional distance Proportional distance method: measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores Measures of Inter-Scorer Reliability -question degree of consistency between scorers of a test Inter-scorer reliability: degree of agreement or consistency between 2 or more scorers with regard to a particular measure Coefficient of inter-scorer reliability: degree of consistency among scorers In the scoring of a test Using and Interpreting a Coefficient of Reliability -3 approaches to estimating reliability 1. Test re-test 2. Alternate or parallel forms 3. Internal or inter-item consistency -what’s used depends on number of factors, such as purpose of getting reliability -if we require lots of reliability, we will have high standards + try to get best reliability Purpose of the Reliability Coefficient -if test is for various times,
