Psychology 2080A/B Chapter Notes - Chapter 4: Internal Consistency, Standard Error, Inter-Rater Reliability

43 views5 pages
of 5
Chapter 4 September 19
·Domain sampling:
·Time sample: what happens on certain occasions
·Internal consistency: are you testing only the one thing you think are test-
ing (a psychology test may turn into a reading test if the questions are diffi-
·Reliability- the thing we are interested in is not directly observable
·Ability is not observable we INFER it from behavior
·Behavior is not transparent we cannot look at it and see your ability
·O= matter of fact we can observe it
·T= thing we are really interested in
·E= always present can be positive or negative
·Fundamental principle: the more error there is in your measurements the
less reliable those measurements are
·Error of measurement is something’s negative and sometimes positive (e.g
not enough sleep, too much noise)
·The more error there is in the measurement the more distance between
·Error in measurements take away from reliability because it allows for
more accounts of how you do on the same test
·True score would be obtained if there were NO error of measurement
·If we test someone too many times the errors will cancel each other out
that tells us the mean is close to the true score
·Sarah’s true score is completely independent from the words we choose
·O=t+ei (ei, e2, e3) errors will be different on all testing occasions
Domain sampling error: error due to items on the test
·Suppose no questions were asked about a certain chapter suppose to be on
the test than your score would not represent that certain chapter. You would
spend time studying a chapter not included in the test oppose to spending
time studying chapters that WILL be on the test
·Exam score may vary depending on whats included or dis cluded from the
·Solution increase the number of questions on the text to be able to cover
the whole knowledge base
·Parallel forms of reliability: choosing two different sets if test items cover-
ing the same material and same difficulty
·If the correlation between scores on 2 parallel forms is low than we have
domain sampling error
Time sampling error: error due to testing occasions
·High correlations mean less influence of bad or good days
·Test retest approach: useful for characteristics that don’t change over time
·Carryover: first testing sessions influences scores on the next session
·Practice: when carryover effect involves learning
Internal consistency error: test should only measure one trait
·Divide the test into two half’s and look at correlation between the two
·People should get the same score on the two halves insisting that you have
a single trait in your test (high correlation)
·When you get a low correlation than you have more than one trait
·How to test that you have one trait?
·Split-half reliability: divide test into A and B scores separately and check
for correlation. Although splitting the questions reduces reliability of the test
by reducing the number of questions reducing the items (e.g 100 split 50 50)
·Rc= thing we actually know
·Re= thing we are interested in
·Spearmen brown formula see slide 28 of 43
·Kuder-Richardson 20 : measure that doesn’t require splitiing of tests into 2
halves. Considers all possible ways of splitting a test into 2 halves (KR20)
When the variance is big the scores are not alike. Item variance: if two items
measure the same trait they co-vary the same people get them right or
·Variance: when there is not much item variance in your total variance you
have co variance which means internal consistency
·Item variance: if item variance is a large part of your total variance you do
not have internal consistency
·KR20 can only be used for a right or wrong answer 2 alternative test items
·Cronbachs a: generalizes KR-20 but you can use it with multiple response
Observational studies
·Observer failure is multiple observers which leads to inter-observer differ-
·Observer failure violence happening on the play ground but you are looking
the wrong way
·Inter-rate reliability coming up with a strategy to judge or rate the specific
behavior you are trying to observe with another observer. Agreement between
2 or more observers. % agreement mat over-estimate interrater reliability
·Kappa statistic estimates actual inter-rater agreement after correction for
·Kappa statistic answers a yes no question. Are the observers independent
of their observations. If they are observing the same thing the results should
not be independent
Standard error of measurement
·Estimates extent to which score misrepresents a true score
·Small SEM means test score is very close to true score
·Large SEM means test score may be distant from true score