Final

Psychometrics Final Review

University of Ottawa
Psychology
PSY3307
All
Fall

Psychometrics Final ReviewChapter 6 ReliabilityIs estimatedRefers to the degree to which test scores are free from errors of measurement Does NOT imply validity ex If someone who is 120kg steps onto a scale 10 times and it constantly reads 150 then it is reliable but not valid but if it reads 120 everytime then it would be reliable and validReliability is analogous to precision while validity is analogous to accuracy3 measures of reliability1TestretestA multiadministration method is timerelated it is used to assess the consistency of a measure from one time to another Administering the same test to the same sample on two different occasions It assumes that there is no substantial change in the construct being measured between the two occasionsuses pearsons r coefficientIs based on your theory of whether you expect your construct and measure to stay the same ex There is a difference between mood and IQ because youd expect IQ to stay the same for months but youd expect ones mood to change and vary from day to dayIs the variation in measurements taken by a single person or instrument on the same item under the same conditions and includes intrarater reliability2Interraterused to assess the degree to which different ratersobservers give consistent estimates of the same phenomenon Best to do as a side study or pilot studyInterrater is the variation in measurements when taken by different persons but with the same method or instruments There are 2 major ways to calculate1Calculating the percentage of agreement between raters2Calculate the correlation between the ratings of the two observers The correlation between these ratings would give you an estimate of the reliability or consistency between the raters this is called the Kappa statistic and is used when the measure is continuous no end or start and can line everyone up like in ordinal or ranking scales3Internal consistencyis a single administration like split half Is typically a measure based on the correlations between different items on the same test or the same subscale on a larger test It measures whether several items that propose to measure the same construct produce similar ones It is about how well a set of slightly different items all measure the same underlying construct equally well unidimensionality It is
