PSYB01H3 Chapter Notes - Chapter 5-7: Inter-Rater Reliability, Fuel Dispenser, Construct Validity

118 views15 pages
Published on 27 Sep 2011
School
UTSC
Department
Psychology
Course
PSYB01H3
Professor
Chapter 5
The most common measurement strategy is to ask people to tell you about themselves
You can also directly observe behaviours
How many mistakes did someone make
Psychological and neurological responses can be measured as well.
Example: heart rate, muscle tension
Reliability of measures
Reliability refers to the consistency or stability of a measure of behaviour
a reliable test would yield the same result each time
the results should not fluctuate from one reading to the next
if there is fluctuation there is an error in the measurement device
Every measurement has two components:
1. True score: the real score on the variable
1. Measurement error
Example: If you administer a highly reliable test multiple times, the scores on them
might be 97-103; however if you used an unreliable test the scores might be 85-115.
The measurement error in the unreliable test is revealed in the greater variability
shown in the scores
Using unreliable measurements is a waste of time because the results will be unstable
and are unable to be replicated
We can assess the stability of measures using correlation coefficients
The most common correlation coefficient when discussing reliability is the
"Pearson product-moment correlation coefficient"
Symbolized as "r"
Range from 0.00 to +1.00 and 0.00 to -1.00
0.00 means that the variables are not related at all
+1.00 means that there is a positive relationship
While -1.00 means there is a negative relationship
Test-retest reliability: assessed by measuring the same individuals at two points in
time
If many people have similar scores we can say that the measure reflects true
scores rather then measurement error
0.80 is how high the correlation should be before we accept the measure as reliable
Internal consistency Reliability
The assessment of reliability using responses at only one point in time, because all
items measure the same variable they should yield similar or consistent results
An indicator of internal consistency is "split-half reliability"
Split-half reliability: this is the correlation of an individual's total score on one half of
the test with the total score on the other half
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

The final measure will include items from both halves
The combined measure will have more items and will be more reliable
than either half by itself
Drawback of this is that it does not take into account each individual item's role
in a measure's reliability. (each question on test is called an "item")
Cronbach's alpha: is based on individual items and is another indicator of Internal
consistency Reliability
Correlates each item with every other item
The value of alpha is the average of all correlation coefficients
Item-total correlations: examines the correlation between each time and the total
score
Since cronbach's alpha and item-total correlations look at the individual items, items that
do not correlate with the other items are removed to increase reliability
Interrater reliability
A single rater might be unreliable but more the one will increase reliability
The degree to which raters agree in their observations is interrater reliability
A commonly used indicator of interrater reliability is called Cohen's kappa
Reliability and accuracy of measures
Accuracy and reliability are totally different
Example: A gas station pump puts the same amount of gas in your car every time,
therefore the gas pump gauge is reliable. However the issue of accuracy is still open.
The only way you can know the accuracy is to compare how much the pump gives
you to a standard measure of a litre.
Construct Validity of measures
Construct validity: the adequacy of the operational definition of variables
To what extent does the operational variable reflect the true theoretical meaning of
the variable
Construct validity is a question of whether the measure employed actually measures
the construct it is intended to measure
Indicators of construct validity
Face validity: the evidence for validity is that the measure appears "on the face of it" to
measure what it is supposed to measure.
Do the procedures used to measure the variable appear to be an accurate operational
definition of the theoretical variable?
Criterion-oriented validity: relationship between scores on the measure and some on
criterion
There are 4 types of criterion-related research approaches that differ in the type of
criterion that is employed
1. Predictive validity: scores on the measure predict behaviour on a criterion
measured at a time in the future
Example: LSAT test predicts how well you'll do in law school
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.

1. Concurrent validity: scores on the measure are related to a criterion measured at
the same time
To see whether two or more groups of people differ on the measure in
expected ways
1. Convergent validity: scores on the measure are related to other measures of the
same construct
One measure of shyness should correlate with another shyness measure or
a measure of a similar construct such as social anxiety
1. Discriminant validity: scores on the measure are NOT related to other measures
that are theoretically different
Seeing if there are correlations between shyness test results and
aggressive/forcefulness test results
Research on personality and individual differences
Systematic and detailed research on validity is most often carried out measures of
personality and individual differences
NEO personality Inventory (NEO-PI)
Measures the 5 major dimensions of personality: neuroticism, extraversion, openness
to experience, agreeableness and conscientiousness
Reactivity of measures
Reactivity is a potential problem when measuring behaviour
A measure is said to be reactive if awareness of being measured changes an
individual's behaviour
Reactive measures don't tell you how the subject behaves under natural settings
You can minimise reactivity by letting the subjects get used to the recording
equipment or to the presence of the observer
Nonreactive or unobtrusive measures involve clever ways of indirectly recording a variable
Variables and measurement scales
Each variable that is studied must be operationally defined
The specific method used to manipulate or measure the variable
There must be at least two values or levels of the variable
Levels can be conceptualized as a scale that uses one of four kinds of measurement
scales:
Nominal: no numerical or quantitative properties
Sometimes called "categorical variables"
Example: male/females
Impossible to define any quantitative values
Ordinal: rank ordering; numeric values limited
Example: 1-5 star restaurants
Intervals between items not known
1. Interval: numeric properties are literal; assume equal interval between values
Example: intelligence, temperature
No true zero (a zero on a thermometer is an arbitrary reference point; having an
absolute zero on the thermometer would mean the absence of temperature which
is impossible)
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 15 pages and 3 million more documents.