Chapter 5: Measuring Concepts
Operationally Deﬁning variables
Every variable must have Operational deﬁnition: method used to manipulate or measure
the variable being studied
• Operational deﬁning relationship’s duration must specify time frame: is your interest
in day, year, months?
• Operational deﬁning relationship’s satisfaction ppl rate their satisfaction using scales
Operational deﬁnition’s quality differ based on reliability and validity
Self-Report Tests and Measures
o Systematic and detailed research on reliability and research is mostly done with self-
o Self-reports measure psychological attributes, abilities and potential
NEO Personality Inventory, NEO-PI: self-report to measure 5 major personality
• Clinical/applied settings = MMPI-II, for clinical diagnosis
• Career-choice decisions: Vocal Interest Inventory etc.
o Use pre-existing measures and not own bc they’re reliable and backed up with valid data to
help you decide which measures to use
• Allows you to compare ﬁndings w/ prior research of the same measure
• Knowing concepts of reliability and validity lets you evaluate quality of existing
measures and chose operational deﬁnitions you create
Reliability of Measures
o Reliability: the consistency/stability of a measure of B. This is the ﬁrst step to a good
• a reliable measure (eg. Intelligence) yields same results every time it’s tested
• ﬂuctuation in measures = error in measurement device
o True Score and measurement error = 2 concepts to understanding reliability
• True score: person’s real score
• Measurement error: Reducing error/variability = reduces uncertainty = increase
− A test w/ least variability is more reliable. It contains less measurement error than a
test that has more variability in measures taken throughout duration of time.
It’s important that the measures be reliable bc you can mostly test ppl once. So, it
has to be close to accurate that 1 time.
− Unreliable measures = meaningless concl bc yield unrepeatable results
• To make reliable measures, observe it multiple times. Thus, reliability increases when
the number of items increases.
Eg) personality scale has 10+ Qs designed to asses ONE trait
• Correlation coefficient (the number that tells us how strongly 2 variables are related)
asses the stability of measures
• Pearson product-moment correlation coefficient (r) Most commonly used
correlation coefficient for stable measures and must show a high positive correlation to
concl. reliable measure.
− Unrelated variables: r= 0; strongest relationship = ±1
− Positive/negative = direction of linear relationship
→ Positive linear relationship = high scores for Var1 = high scores Var2
→ Negative linear relationship = High scores for var 1 = low score Var 2
o You measure the same individual at 2 points in time
o Calculate the r, which determines the relationship b/w the 2 tests
o If positive, the measure reﬂects true score, not measurement error
o R > 0.80 o Problem w/ this is individuals remember how they responded to test 1
o So, alternate forms reliability: you give 2 different forms of the same test to same
individuals at two points in time
• Useful for sontant variables: Intelligence is relatively constant/t so test-retest reliability
− Mood changes so test-retest reliability won’t be appropriate
Internal Consistency Reliability
Asses how well a certain set of items relate to each other
o Since all items measure the same variable, they should yield similar/consistent results
o Test at 1 point in time.
o A common indicator of ICR is value Cronbach’s alpha
• You calculate how well each item relates to every other item, which produces a large #
of interim correlations
• This value is based on the avg of these interitem correlations and the number of items
in the measure
• By providing info of each item, you can eliminate non-correlated items to increase the
This lets you construct a quick version of a measure for a quick convenient, but reliable
Is the extent to which raters agree in their obs.
o So if 2 raters are judging whether b are aggressive, high interrater reliability is obtained
when most of the obs result in same judgement
• 2 raters, bc more reliable than 1 rater.
o A commonly used indicator of interrater reliability is called Cohen’s kappa
Reliability and Accuracy of Measures
o Just because a test is consistent, it doesn’t mean it is accurate
o For example, measuring intelligence with foot-size device. Results are constant each time,
but is not an indicator of your INT
o This is the difference b/w reliability and accuracy of measures leads is to a consideration of
the validity of measures
Validity of Measures
o Validity = second step to operational deﬁnition
o Construct validity: adequacy of the operational deﬁnition of variables
• Construct: denote a variable that is abstract and needs an operational deﬁnition (eg.
Social anx etc)
• So construct validity’s concern: does the measure what it means to measure? 2 ndstep
for operational def.
Indicators of construct validity
o Face validity: The content of the measure appears to reﬂect the construct being measured
Purely subjective and based on one’s opinion, so it’s insufficient in determining