# PSYB01H3 Chapter Notes - Chapter 5-7: Inter-Rater Reliability, Fuel Dispenser, Construct Validity

118 views15 pages

Chapter 5

•The most common measurement strategy is to ask people to tell you about themselves

•Rate your overall happiness

•You can also directly observe behaviours

•How many mistakes did someone make

•Psychological and neurological responses can be measured as well.

•Example: heart rate, muscle tension

Reliability of measures

•Reliability refers to the consistency or stability of a measure of behaviour

•a reliable test would yield the same result each time

•the results should not fluctuate from one reading to the next

•if there is fluctuation there is an error in the measurement device

•Every measurement has two components:

1. True score: the real score on the variable

1. Measurement error

•Example: If you administer a highly reliable test multiple times, the scores on them

might be 97-103; however if you used an unreliable test the scores might be 85-115.

The measurement error in the unreliable test is revealed in the greater variability

shown in the scores

•Using unreliable measurements is a waste of time because the results will be unstable

and are unable to be replicated

•We can assess the stability of measures using correlation coefficients

○The most common correlation coefficient when discussing reliability is the

"Pearson product-moment correlation coefficient"

Symbolized as "r"

Range from 0.00 to +1.00 and 0.00 to -1.00

•0.00 means that the variables are not related at all

•+1.00 means that there is a positive relationship

•While -1.00 means there is a negative relationship

•Test-retest reliability: assessed by measuring the same individuals at two points in

time

○If many people have similar scores we can say that the measure reflects true

scores rather then measurement error

•0.80 is how high the correlation should be before we accept the measure as reliable

Internal consistency Reliability

•The assessment of reliability using responses at only one point in time, because all

items measure the same variable they should yield similar or consistent results

○An indicator of internal consistency is "split-half reliability"

•Split-half reliability: this is the correlation of an individual's total score on one half of

the test with the total score on the other half

○The final measure will include items from both halves

The combined measure will have more items and will be more reliable

than either half by itself

○Drawback of this is that it does not take into account each individual item's role

in a measure's reliability. (each question on test is called an "item")

•Cronbach's alpha: is based on individual items and is another indicator of Internal

consistency Reliability

○Correlates each item with every other item

○The value of alpha is the average of all correlation coefficients

•Item-total correlations: examines the correlation between each time and the total

score

•Since cronbach's alpha and item-total correlations look at the individual items, items that

do not correlate with the other items are removed to increase reliability

Interrater reliability

•A single rater might be unreliable but more the one will increase reliability

•The degree to which raters agree in their observations is interrater reliability

○A commonly used indicator of interrater reliability is called Cohen's kappa

Reliability and accuracy of measures

•Accuracy and reliability are totally different

○Example: A gas station pump puts the same amount of gas in your car every time,

therefore the gas pump gauge is reliable. However the issue of accuracy is still open.

The only way you can know the accuracy is to compare how much the pump gives

you to a standard measure of a litre.

Construct Validity of measures

•Construct validity: the adequacy of the operational definition of variables

○To what extent does the operational variable reflect the true theoretical meaning of

the variable

○Construct validity is a question of whether the measure employed actually measures

the construct it is intended to measure

Indicators of construct validity

•Face validity: the evidence for validity is that the measure appears "on the face of it" to

measure what it is supposed to measure.

○Do the procedures used to measure the variable appear to be an accurate operational

definition of the theoretical variable?

•Criterion-oriented validity: relationship between scores on the measure and some on

criterion

○There are 4 types of criterion-related research approaches that differ in the type of

criterion that is employed

1. Predictive validity: scores on the measure predict behaviour on a criterion

measured at a time in the future

○Example: LSAT test predicts how well you'll do in law school

1. Concurrent validity: scores on the measure are related to a criterion measured at

the same time

○To see whether two or more groups of people differ on the measure in

expected ways

1. Convergent validity: scores on the measure are related to other measures of the

same construct

○One measure of shyness should correlate with another shyness measure or

a measure of a similar construct such as social anxiety

1. Discriminant validity: scores on the measure are NOT related to other measures

that are theoretically different

○Seeing if there are correlations between shyness test results and

aggressive/forcefulness test results

Research on personality and individual differences

•Systematic and detailed research on validity is most often carried out measures of

personality and individual differences

•NEO personality Inventory (NEO-PI)

○Measures the 5 major dimensions of personality: neuroticism, extraversion, openness

to experience, agreeableness and conscientiousness

Reactivity of measures

•Reactivity is a potential problem when measuring behaviour

○A measure is said to be reactive if awareness of being measured changes an

individual's behaviour

•Reactive measures don't tell you how the subject behaves under natural settings

•You can minimise reactivity by letting the subjects get used to the recording

equipment or to the presence of the observer

•Nonreactive or unobtrusive measures involve clever ways of indirectly recording a variable

Variables and measurement scales

•Each variable that is studied must be operationally defined

○The specific method used to manipulate or measure the variable

•There must be at least two values or levels of the variable

○Levels can be conceptualized as a scale that uses one of four kinds of measurement

scales:

○Nominal: no numerical or quantitative properties

○Sometimes called "categorical variables"

○Example: male/females

○Impossible to define any quantitative values

○Ordinal: rank ordering; numeric values limited

○Example: 1-5 star restaurants

○Intervals between items not known

1. Interval: numeric properties are literal; assume equal interval between values

○Example: intelligence, temperature

○No true zero (a zero on a thermometer is an arbitrary reference point; having an

absolute zero on the thermometer would mean the absence of temperature which

is impossible)