Chapter 4
Two scientific studies activities:
Exploratory data collection and anaylsis :
is aimed at classifying behaviors, identifying potentially important variables,
identifying relationships between those variables and the behvaiors
Hypothesis testing:
Evaluating potential explanations for the observed relationships
Causal relationship, one variable directly or indirectly influences another.
 unidirectional, A influences B but not vice versa
 bidirectional each variable influences each other
correlational relationship:
changes one variable accompany a change in another, but no proper testing has
been done to show that they actually influence each other.
when changes in one variable tend to accompany a specific change in
another the variables are said to covary
correlational research determine whether two variables covary and if so
establish the directions, magntitudes and forms of the observed relationship
 nonexperimental
 observing two values of two or more variables and determining what
relationships exist between them.
 Make no attempt to manipulate variables, observe “as is”
 Makes It possible to predict from the value of one bariable the probable
value of the other variable.
 The variable used to predict – predictor variable
 The variable whose value is being predicted is called the criterion
variable
Two problems with this method
 third variable problem
o you want to prove variation in one of the observed variables could
only be due to the influence of the other observed variable.
o However there could be a third variable, its usually unobserved
and may influence both variables causing them to vary together
even though no direct relationship exists between them
o Must examine the effects of each potential third variable to
determine whether it does account for the observed relationship.
 Directionality problem
o The direction of causality is sometimes difficult to determine.
Reasons for choosing correlational
manipulating the variables may be impossible or unethical
 can provide rich source of hypothesis that can be later tested
experimentally
 you want to see how naturally occurring variables relate in the real world Experimental research
 incorporates a high degree of control over the variables in the study
 establish causal relationships among the variables
 manipulation of one or more independent variables and control over
extraneous variables
manipulate the independent variable
 chosen by the experimenter
 specific conditions associated with each level is called treatments.
 By manipulating you hope to show that changes in the levels of the
independent variable cause changes in the behavior recorded
Receiving treatment experimental group
Other group control group
Extraneous variables – those that may affect the behavior that you wish to
investigate but are not of interest for the present experiment
Uncontrolled variability
difficult or impossible to detect any effects of the independent variable
produce chance differences in behavior across the levels of the independent
variable
hold these extraneous variables constant
make sure all treatments are exactly alike
randomize their effects across treatment
 make them even out and allow them to be unmistaken for effects across the
independent variable
random assignment allows you to use inferential stats to evaluate the probability
with which chance alone could have produced the observed differences
strengths and limitations
experimental approach can tell you whether changes in one variable actually caused
changes in the other also ability to identify and describe casual relationships
 limitation – you cannot use the experimental method if you cannot
manipulate your hypothesized casual variables.
 The tight control over extraneous factors required to clearly reveal the
effects of the independent variable
Experiments vs demonstration
Lacks an independent variable
Exposes to just one treatment condition
Simply expose a single group to a particulat treatment and measure the behavior
Useful this happens and not that.
Not experiments and do not show causual relationships
Internal and external validity
 internal – ability of your research design to adequately test the
hypothesis  test the hypothesis it was designed to test
 this means that the independent variable caused the observed variation
in the dependant variable
 in a correlational study changes in the value of your criterion relate solely
to changes in the value of your predictor variable
 internal validity is threatened to the extent that extraneous variables can
provide alternative explanations for the findings of the study
confounding variables when two or more variables combine in such a way that
their effects cannot be separated.
 confounding is less problematic when the confounding variable is known
to have little or no effect on the dependant or criterion variable or when
its known effect can be taken into account in the analysis
 best thing to do is to substitute what you believe to be less serious threats
to internal validity for the more serious ones
threats to internal validity
history (an event may occur between two different observations)
maturation(effect of age or fatigue)
testing(when a pretest sensitizes participants to what you are investigating )
instrumentation (unobserved changes in criteria used by observers or instrument
calibration)
statistical regression (scores tend to be closer to the average in the population
when before they were more of outliers)
biased selection of subjects(differ initially and that’s what effects the change,
usually happens when preexisting groups in their studies rather than assigning
subjects to groups at random)
experimental mortality (loss of participants)
external validity
 degree that results canbe extended beyond the limited research setting
and sample in which they were obtained,
 may tell us little about how they react in the real world
 objective is to gain insight into the underlying mechanisms rather than
discover relationships
threats
 may be less relevant in basic research
 becomes more relevant when the findings are expected to be applied
directly to a real world setting
setting
setting is effected by
costs
convience
ethical considerations
 research question
lab setting
 gain important control over variables that could effect results  gain control over extraneous variables that could effect the dependent
variable
 may lose generality
 simulation
o might want to do it because its unethical
o expensive and time consuminh
o retaining control
o relatively realistic conditions
 designing a simulation
o observe and study carefully
o identify crucial elements
o more realistic = greater chance that it will be applicable to real
world phenonmenon
 realism
o mundane simulation mirrors real world event
o experimental simulation psychologically involes the participant in
the experiment
field research
 participants natural environment
 manipulate independent variable and measure a dependent variable
 has all the qualitites of a lab experiment
advantages and disadvantages
results and be generalized to the real world
 disadvantage little control over potential confounding variables, (low internal
validity)
 extraneous variables can obscure or distort the effects of the independent variable
in field experiments
e know that probability sampling strategies are most likely to give us a representative sample
An inclusion criterion might be declaring a major in a heavily mathoriented field (e.g.,
mathematics, computer science, physics). An exclusion criterion might be test anxiety (e.g.,
test scores might be lower not because of the stereotype threat but because of test anxiety
that decreases performance in all testing situations).
Random assignment controls extraneous factors because it randomly distributes personal
characteristics that can influence outcome across conditions.
There are two ways to randomly assign participants to experimental conditions:
1. Free random assignment
2. Matched random assignment.
With free random assignment, the experimenter uses a random number table or a random
number generator on a calculator or computer to assign participants to groups. There is no
attempt to measure and use personal characteristics as part of the random assignment
process. With matched random assignment, information about subject characteristics is collected and
used to identify similar participants. After the match is made, participants are then randomly
assigned to groups. This strategy insures an equal distribution of critical personal
characteristics across experimental conditions.
Bias can be introduced into studies by both the experimenter and the participant. Direct
knowledge of the study hypothesis, the nature of the experimental manipulation, and group
assignment can lead to subtle differences in the ways that experimenters and participants
interact in the research setting. Restricting knowledge of the experiment through "blind"
procedures can help to eliminate this bias.
n a singleblind procedure, a laboratory assistant who does not know the study hypothesis
administers the experimental manipulation. The laboratory assistant also does not know the
experimental condition to which the participant was assigned. Having a naive intermediary
between the experimenters who designed the study and the research participants prevents
experimenter expectancies from influencing study results.
In a doubleblind procedure, both the person administering the experimental manipulation and
the participant do not know the study hypothesis and group assignment. The prototypic
doubleblind study is a randomized study of medication. The participants receive either the
active drug or a pill that looks, smells, and tastes exactly like the drug but without the active
ingredient. Neither the experimenter nor the participants knows whether the active drug or
placebo is being administered until after the study is over.
Performance is compared within individual participants. Order and sequence effects are major
sources of error in withinsubjects designs.
Order effects produce changes in performance based on the order of the condition in the
experiment and not the manipulation in the specific condition. Practice effects can be
considered order effects. In cognitive experiments, performance is usually lower on the first
task because participants are unfamiliar with the setting. Once the participants become
familiar with what is required, performance increases.
Fatigue effects are also order effects. Performance is worse in later conditions because
participants are tired. Sequence effects are produced by characteristics of the experimental
manipulation. For example, in a study of perception of weight, participants will judge a weight
lighter if it follows a heavy rather than a light weight. The same occurs in reverse; participants
will judge weights as heavier if they follow a light rather than a heavy weight. Sequence
effects are caused by an interaction between order and specific aspects of the manipulation.
We always try to build controls that minimize or eliminate confounds/threats to
validity into our studies.
The most general control in experimental research is adequate preparation of the
research setting.
Extraneous variables can be controlled by carefully selecting who is in our study
through inclusion and exclusion criteria and by randomly assigning participants to
experimental groups.
Singleblind and doubleblind procedures can help to control for experimenter bias. Control groups in betweengroups experimental designs should be as similar as
possible to experimental groups.
Counterbalancing in as effective control in withinsubjects designs.
Hypothesis testing is one of the most important concepts in Statistics.
This is how we decide if:
Effects actually occurred.
Treatments have effects.
Groups differ from each other.
One variable predicts another.
Null hypothesis – nothing happened
You test your sample statistic against the value based on the Null Hypothesis sampling
distribution.
Your sample statistic and Null Hypothesis sampling distribution values are close.
You conclude that they are not different; you did not find an effect in your study.
Alternative hypotheisis – something happened
You test your sample statistic against the value based on the Null Hypothesis sampling
distribution.
Your sample statistic and Null Hypothesis values are not close.
You conclude that they are different; you found an effect in your study; hooray!
Hypothesis testing:
1. You come up with your hypothesis (for example  college students sleep less than
other folks).
2. You generate a sample (pick a set of college students).
3. You calculate your summary statistics (for example, the mean and SD of number
of hours that college students sleep per night).
4. You determine the statistical test that will compare your summary statistic against
the value determined by your Null Hypothesis. (You would use the single sample ttest
for college students’ sleep.)
5. You calculate the test statistic using your summary statistics. The formula for
the test statistics is different for each type of test but the basic concept is the same.
You calculate how far your sample is from the Null Hypothesis taking into account that
sample values of a statistic vary by chance when smaller samples are taken from a
larger population. The SE tells us how much they vary.
6. You derive the appropriate sampling distribution  or refer to one already listed in
the tables in your statistics book. Your computer program can also give you this
information. 7. You choose the cutoff value on your sampling distribution that tells you that
your sample statistic is very far from the Null Hypothesis and thus not likely. We call
this cutoff value our alpha level or significance level (more about alpha later).
8. You decide whether to reject the Null Hypothesis or fail to reject the null. You
do this by comparing your test statistic to the cutoff value.
9. You draw your conclusion. If you reject the Null Hypothesis, you say that your
result is statistically significant. This simply means that it did not happen by luck or
chance. If you fail to reject the null, you conclude that you did not find an effect or
difference in this study.
You can make an error or two when you test hypotheses. You might say things are different
when they are not. You may miss a relationship that really exists. These are called Type
Iand Type II errors, respectively.
Power is the probability of correctly rejecting a false Null Hypothesis.
Many experts recommend that you use a power of .80. This means that you have an
80% chance of finding a difference when you really want to find it. You don't want to miss a
real difference or correlation. (Bad  missing a difference is called a Type II error with
probability equal to Beta).
Power is equal to 1  Beta.
The test might say there is a difference when there is not one.
(Bad  an error called Type I error whose probability equals your alpha rate: .05 or .01).
Depending on conditions, you may have a good or bad chance of finding the desired result. To
increase power you can:
1. Try to increase the effect size or the strength of the relationship.
2. Decrease experimental error.
3. Use a higher alpha level (say .05 as compared to .01). Note this increases
power but also Type I error.
4. Increase sample size.
5. Use matched samples or covariance technique
We always test a null hypothesis against an alternative/research hypothesis.
If a sample is close to null, we conclude that nothing happened in the study. If
a sample is far away or different from the null, we reject the null hypothesis
and conclude that something happened.
The logic of hypothesis testing is counterintuitive (or backwards). We test
whether nothing happened (our sample value is close to the null) in order to
conclude that something happened.
There are two types of error in hypothesis testing (Type I and Type II). Type I
errors occurs when we conclude that there is a difference when there is not. Type II errors occur when we conclude that there is no difference when there
is.
Statistical power is the probability of correctly detecting a difference between
probability of detecting a true difference., we want to maximize the
Video
Name and identify the 4 major types of validity
Internal is the independent variable responsible for the observed changes in
the dependednt variables (objective)
 infer causation
 function of procedures and study design
 confounds – occurs when two potential effective variables are
allowed to covary simultaneously
o could be as responsive to dependent , shows up at the same
time
 need high level of constrait
 adequatlely controlled concerns
Statistical – are the statistical tests accurate? (objective)
External do the results apply to the broader population (subjective)
Construct is our theory the best explanation for the results (subjective) the
most subjective, accumulation of evidence
Name and define major confounding variables
Validity check
Instrumentation
Maturation
History
Regression to the mean
Testing/practice effects
Selection bias
Attrition/ differential
Morality
Sequence effects Diifusion of treatments
Compensatory rivalry
Resentful demoralization
Explain why some forms of study validity are considered moe objective and
others are more subjective
Reduced to the rules that exisist. Rules are clearer and more agreed upon in
internal rather than external.
Describe specific threats to validity in several additional sample studies and
explain how the study can be redesigned to reduce threats
Movie 2
Maturation passage of time, development
History competing event. Another explanation, covary
Testing/ practice effects tested once and tested twice, changed in
measurement process. Might remember the questions, taking the
measures in two points in time
Regression to the mean leveling effect. Extreme scores, rarely end up
extreme again. mid range not too much change. Rare to see high scores
stay high and low scores stay low.
Selection bias choice of participants, limited.
Attrition(front end)/ differential mortality (back end) loose some
people. Loose some people moved, no longer want to participant, death.
Lost certain kinds of people.
Within a small environment
Diffusion of treatment students talk and are in two different groups,
gets benefit of treatment. Virtue of hanging out.
Compensatory rivalry no treatment, shows that he doesn’t need the
treatment. Control group. He gets motivated to do better than the
treatment group.
Conclude that there was no effect because theres no difference because
everyone was getting treatment blah blah sharing info
Resentful demoralization opposite of compensatory effect. Apart of
control, theyre motivated to do worse. Mad. Bummed. Less desireable
treatment. Differential attrition/ morality group being measured is different. Lost
people.
Participants may be aware
Subject effects
 demand characteristics
 placebo effects
experimenters are also active
 bias
 know
subjects may try and behave the way they think they are suppose to
behave
 hawthorne effect
 outguessing the experimenter
 know youre in experiment and reacting to the awareness
demand characteristic
 respond to subtle cues about what is expected
 also occurred in hawthorne
 feedback, paid and production
placebo effect
 expectations that the treatment will work
experimenter effects
 demand characteristics
 subtle biases in observations, recording, measurement
 unlike fraud this behavior is out of our awareness
construct validity
 why did it work
 manipualate what we intended to
 does dependent measure get at what we say it does
 is our theory the best explanation for the results?
Inadequate preoperational explication of constucts
 did you think through the theory and definitions of constucts
before you measured
mono operation bias
 self report meaures, only used interviews, etc. operationalized
in a single fashion. Didn’t adequately express
assessing
 process measures if trying to link treatment to effect
 did toy manipulate what you said  did dependent variables indented to measure same construc
actually do so?
 Include alternative measures that should not be effected by the
treatment
 Is there too much overlap on irrelevant factors with your
dependent measures? Monooperation bias
Did my treatment cause the outcome?
Chapter 8,9,13,14
Observational designs do not involve maone nipulating the independent variable
Behavioral catergories – define what is being recorded.
Make sure they are defined well and not ambigious, cultural traditions may not be
agreed on
 become familiar with the behaviors and make a list
 do preliminary observations
 literature research
frequency method within a time period
duration method how long a behavior lasts
interval – divide up observation period into time intervals and record a behavior
that occurs within that time period. (short enough for one behavior to occur )
complexity – how to make your observations
time sampling
 scan group for specific time
 alternate between periods of observation and recording
individual sampling
 single subject for observation over a given time period .
 repeat for other individuals
event sampling
 observe only one behavior
recording
 use recording devices
 multiple observers watch the video independently
 hide a camera better than you can hide yourself.
Use audio recorders instead of taking notes
 eyes are focused on the subject
 faster
 disadvantage: may disturb subjects
reliability of observations  disagreement may happen if you have not clearly defined the behavioral
catergories
 interrater reliability provides an empirical index of observer agreement
o establishing this helps ensure their accurate and reproduceable
 simplest way to asses rinterrater reliability is to evaluate percent agreement
o high as possible
o 70% is acceptable
o if defined as exact match then the percent agreement underestimates
interrater agreement
o only gives raw estimate of agreement
o may have extremely high levels of chance agreement in which percent
agreement overestimates
cohens kappa
 assesses amount of agreement that would be expected by chance
 need to determine
o proportion of actual agreement
o agreement expected by chance
 1 step is to tabulate in a confusion matrix
 2 step compute
rd
 3 find the proportion of expected agreement by multiplying corresponding
row and column totals
 any value of .70 or greater indicates acceptable reliability

pearsons product moment correlation
 if observers agree pearson R will be strong and positive
 they can be highly correlated even when obserbers disagree
 magnitudes of the recorded scores increase and decrease similarily
intraclass correlation coefficient
 reliability on observations scaled on an interval or ratio scale of
measurement
observer bias
 when the observers know the goal of a study or the hypothesis the
observations are influenced by this information
 use a blind observer
 when observers interpret what they see rather than simply recording
behavior
quantitative and qualitative
 quantitative data is expressed numerically
 qualititative data is written records of observed behavior
o cannot apply standard descriptive and inferential statistcs to your
data
naturalistic observations
unobtrusive observations
 do not alter natural behaviors of subjects  be hidden, habituate the subjects to your presence (video as well)
 behavior in the real world
 high external valididty
 cant use naturalistic observation to investigate the underlying causes of the
behaviors
 requires you to be there, engaged the entire time.
Ethnography
 researcher becomes immersed in the behavioral or social system being
studied
 primarily to study and describe the functioning of cultures through a study of
social interactions and expression between people and groups
 done in field setting
 participant observation – part of the group
 non participant observation serve as a nonmember of the group
 minimize people altering behavior by training participant observers not to
interfere or use observers that are blind
 remove problem of reactivity by observing covertly
 covert participant part of the group but disclose status
 gaining access might be hard you need to get past the gatekeepers who are
the protectors of the group.
 Another entry into the group is to use guides and informants who convince
the gatekeepers that your aims are legit and study is worthwhile.
 First step to analyzing is to do an initial reading of field notes
 Second step to analyzing is to code any systematic patterns
 Ethnographic is purely descriprive in nature, we cannot explain why
Sociometry
 identitifying and measuring interpersonal relationships within a group
 use sociometry as the sole research tool to map interpersonal relationships
 sociogram the choices of friends
case history
 observe a single case
 not experimental design
 demonstration
 no manipulation of IV
 cannot determine causes
archival research
 nonexperimental strategy that involves studying existing records
 all factors pertaining to observational research apply to archival research
 gain access to archived material
 practical matter is the completeness of the records
 purely descriptive
 maybe indentify interesting trends or correlations
 cannot establish causual relationships
content analysis
 analyze a written or spoken record
 occurrence of catergories or events, pauses, negative comments, behavior etc  usually use archival sources for analysis
 example is court proceedings
 all factors that are observational research apply to content
 observational technique
 objective
o clear set of rules
 systematic
o info assigned into catergories
o include articles that are for/against personal favor
 should have generality
o fit within theoretical, empirical or applied context.
 When performing you need clear operational definitions of terms
o Materials need to be analyzed before you develop catergories
o The recording unit is the element of the materials that you are going
to record
o Context unit context within which the word was used
 Who will do the analysis
o May be effected by bias
o use blind observer
o use 1+ observer to evaluate intterater reliability
o content alaysis of a biased sample may produce biased results
 content analysis is purely descriptive
 durability can be a problem
 invalidated over time
meta analysis
 conclusion may not accurately reflect the strength of the relationships
examined in your review
 meta analysis – set of statistical procedures that allow you to combine or
compare results
 form of archival research
 doing meta analysis on meta analysis is called second order meta analysis
 3 steps
o identify relevant variables
o locate relevant research to review
o conduct the proper meta analsysis
 step 1
o identify variables
o focus on only those related to your topic
o what variables to record
o driven by the research question
o info needed is dependent on the meta analysis technique you use
 step 2:
o locate research to use
o file drawer phenonmenon inflates the type 1 error
how to deal attempt to uncover those studies that never reached
print
estimate the extent of the impact on the file drawer
phenonmenon on your analysis
o done by determining the number of studies that
must be in the file drawer before serious biasing
takes place
 step 3
o apply technique
st
o 1 technique shows you can compare studies
o doing a meta analysis comparing studies is analogous to conducting
an experiment using human or animal subjects
o second technique combine studies to determine the average effect of a
variable across studies
o comparing effect sizes is more desirable than looking at p
p value only tells you likkihood of making type 1 error
 drawbacks to meta
o published research may vary
o research in new areas is rejected from refereed journals
o quality ratings would be made twice
once after reafing the method section alone
then reading the method and results sections together
o common critizim is that its difficult to understand how studies with
widet varying materials measure and methods can be compared
o core issue is whether or not differing methods are related to
differenct effect sizes
chapter 9
in a field survey you directly ask people their behavior
you can draw an inference about the factors of underlying behavior
one major ethical concern whether and how you will maintain the anonymity of
your participants and the confidentiatlity of their responses.
Designing questionnaire
 clearly define the topic of your study
 keep it focused because too much in a survey or a long survet can confuse
and overburden
 demographics are used as predictor variables
 specifically measure voter preference – criterion variable
 administer your questionnaire to a piolet group make sure relaibale and
valid
open ended questions  his or her own words
 drawback – may not understand exactly what you are looking for or omit
some answers.
 Can also make sumerizing data difficult
Restricted items
 provide a limited number of specific response alternatives
 control the participants range of responses
 easier to summarize and analyze
 not as rich in information
partically open ended items
 resemble restricted items but provide an additional, “other” catergory and an
opportunity to give answers not listed.
 Helps respondents separate the question from the response catergories that
follow.
 Make any special instuctions intended to clarify a question a part of the
question itself.
 Put check boxes blank spaces or numbers.
 Place all alternatives in a single column
Rating scales
 scales with fewer than 10 points also are frequently used but you should not
go below 5 points
 end points labeled – anchors keep interpretation from drifting
 all points are labled provides moe accurate info
 reasonable compromise is the ends and middle labled.
 The psychological phenonmenon underlying the scale and the scale itself
 Likert scale agreement – disagreement
Assembling your questionnaire
 coherent visually pleasing format
 demographic items should not be placed first
 interesting and engaging
 apply to everybody
 be easy
 interesting
 continuity
 organized
 order affects answers only when people are poorly educated
 place objectionable questions after less objectionable ones
 use graphics
 verbal/ graphical relate to how your questions are worded and presented
mail surveys
 mail directly to the participant
 nonresponse bias – fail to complete questionnaire
 develop strategies to increase return rate
 multiple contacts
 include small token of appreciation  less money tends to work better
 lower cost
 consider this way first
internet surveys
 distributed via email
 short and simple
 quick and easy and have large data set
 internet may not be representative of general population
telephone surveys
 contact by phone
 have an interviewer ask questions or a robot
 touch tone relephone to respond
group administered surveys
 people may participate because little effort is required
 not treated as serious when in a group
 not ensure anonymity
 right to decline participation may be harder
face to face interviews
 directly speaking
 in a structured interview you prepare questions
 unstructured you have a general idea but no sequence of questions
 structured all participants asked the same thing same order
o easier to summarize and analyze
o may miss some important info having a highly structured interview
 unstructured may be hard to code later on
 experimenter bias and demand characteristics become a problem
 run a piolet test
reliability
repeated measures
 admister once then allow time then again
 consider how long to wait
 too short may result in participants remembering questions and answers
they gave
 leads to artificially high level of testretest reliability
 wait too long it could be low
 test restest may be problematic when
o measuring ideas fluctuate with time
o issues for which individuals are likely to remember their answers on
the firs ttesting
o questionaires that are long and boring
 parallel forms
o must be equivalent
o same number of items and the same response format
o eliminate possibility that rapid chaning attitudes will result in low
reliability
 sing administration o split half split in half and two scores are graded
o works best when limited to a specific area
o each score is based on a limited set of items which can reduce
reliability
o don’t do if its not clear how the splitting should be done
o some do the odd even split
o apply the kuderrichardson formula
o the higher the number the greater the reliability
o .75 is moderate
o likert formar coefficient alpha is used
 inceasing reliability
o increase the number of items on your questionnaire
o standardize administration procedures
o score carefuly
o clear well written and appropriate
 validity of the questionnaire
o content validity assesses whether the questions cover the range of
behavirs normally
o construct validity stablished by showing the questionaires results
agree with predictions based on theory
o criterion related validity coorelating the results with those of another
established measure.
Concrurrent validity same dimension administered at the
same time
Predictive validity correlating with some behavior that would
be expected to occur
Sample
 representative sample closely matches the chatacteristics of the population
 random sampling every member has an equal chance of appearing in the
sample
 simple random sampling randomly selecting a certain number of individuals
from the population
 random does not gaurentee a representative sample
o combat this by selecting a large sample
 representative sample
o dividing the population into segments
o selecting an equal size from the segment
 proportionate sampling
o the proportions of people in the population are reflected in your
sample
 systematic sampling
o used in conjunction with stratified sampling
o every xth element after a random start
 cluster sampling
o basic sampling unit is a group of participants rather than the
individual participant o saves time
o cost effective
o multistage sampling
identify a large cluster and randomly select among them
sample size
 economic sample enough participants to ensure a valid survey and no more
 the amount of acceptable error and the expected magnitude of the
populations proportions
 the deviation of sample characteristics from those of the population are
called sampling error.
 Look at literature and see what margin of error was used
 Magnitiude of the differences you expect to find
 Design a small pilot
Chapter 13
Unstacked format create separate colums for the scores from each treatment
Stacked one column for all treatments, column for the treatment levels, column for
dependent variable.
Quantitative independent variable can just be put as the number however a
qualitiative indepenedent variable must be assigned a number for each level aka
dummy coding
Grouped data
 taking an average = one score that characterizes an entire distribution
 may not represent the performance of the individual subject
 curve resulting from plotting averaged data may not reflect the true nature of
the psychological phenonmenon being studied.
Individual data
 makes the most sense
 reflects the effect of the independent variable more faithfully than data
averaged over the entire group
look at both the grouped and individual data
graphing
 represents data in a 2D space
 horizontal axis is the x axis independent variable
 vertical axis is the y axis and that’s the dependent variable
bar graphs
 length of the bar represents the value of the dependent variable
 error bars the precision of the estimate in the form of error bars
o error bars show the variability of scores around the estimate
 a bar graph is the best method of graphing when your independent variable
is categorical
 x axis is catergorical and qualitiative
line graphs  works when x axis is contimuous and quantitiative
 positively accelerated when the curve is flat at first and becomes
progressively steeper along the x axis
 negatively accelerated when the curve is steep at first and then becomes
progressively flatter and levels off at a max or min
 monotonic is when its uniformly increasing or decreasing
 nonmonotomic is that a function contains reversals in all directions
scatter plots
 correlational strategy
 line of best fit
 include the equation for this line and the coefficient of correlation
 helpful for when you calculate a measure of correlation
pie graphs
 for data in the form of proportions or percentages
 if a piece is pulled out its called and exploded pie graph and it emphasizes the
proportion of time devoted to the subject
frequency distribution
 consists of a set of mutually exclusive catergories into which you sort the
actual values observed in your data together with a count of the number of
data value falling into each catergory.
Histogram
 resemble bar graphs
 are drawn touching each other
 y axis is a frequency, typically a mean score
stemplot
 simplifys the job of displaying distributions
 easy to construct and have advantage over histograms and tables of
preserving all the actual values present in the data
 inherently create class widths of ten
 not useful when the sets become too large
skewed distribution
 has a long tail trailing off in one direction and a short tail in the other.
 Positively skewed long tail goes to the right
 Negatively skewed long tail goes off to the left, downscale
Normal distribution symmetric and hill shaped bell curve
Measures of the center
 mode
o most frequent
o bimodal have 2 modes
o nominal and ordinal scale
 median
o middle score
o order from highest to lowest
o two middle scores you take the average
o ordinal scale
 mean o senstitive to distance between scores
o interval or ratio scale
o normally distributed use mean as measure of center
o negatively skewed the mean underestimates the center
o positively skewed the mean overestimates the center
o neither mean or median will accurately represent the center if your
distribution is bimodal
measure of spread
 range
o simplest and least informative
o does not take into account magnitude of the scores between the
extremes
o very sensitive to outliers
 interquartile range
o order the scores
o divide into 4 equal parts
o less sensitive to extreme scores
 variance
o average squared deviation from the mean
 standard deviation
o most popular measure of spread
5 number summary st
 minimum, the 1 quartile, the median, the third quartile and the maximum
 interquartile range is Q3 –Q1
associatiation, regression
 pearsons product moment correlation coeffiecient or pearson r
o scale your independent measures on an interval or ratio scales
o pearson r can range from +1 through 0 to 1
o positive correlation represents a direct relationship
o negative correlation indicates an inverse relationship
o correlation of 0 says no relationship exisits
o magnitiude of the correlation coefficient tells you the degree of linear
relationship
o o means not relationship exsists
o Both+1 and 1 represtnet perfect linear relationships
o Parabola shape is called a curvilinear relationship
 Point biserial correlation
o Because one variable is continuous and the other dichotomous this
would apply
o Dichotomous relationship is dummy coded
o Magnitiude partyl depends on the proportion of participants falling
into each of the dichotomous catergories
o If the number of participants in each category are not equal then the
maximum attainable value for the pointbiserial correlation is less
than + or – 1
 Spearmen rank order correlation o Either when your data are scaled on an ordinal scale or when you
want to determine whether the relationship between variables is
monotonic
 Phi coefficient
o Both variables being correlated are measured on a dichotomous scale.
 Bivariate regression
o Find the straight line that best fits the data plotted on a scatter plot
o The best fitting straight line is the one that minimizes the sum of the
squared distances between each data point and the line as measured
along the y axis
 The coefficient of determination
o The square of the correlation coefficient
 According to the text, you can increase the reliability of your questionnaire by
B. standardizing administration
procedures.
C. writing clear, appropriate
questions.
In ________ sampling, you identify naturally occurring groups (for
example, classes in a school) and sample some of those groups.
cluster
If you decide to assess reliability with multiple tests and use
alternate forms of your questionnaire, you would then use ________
to assess reliability.
 parallel forms
Labeling each point on a scale versus labeling only the end points
Usually does not significantly affect the responses participants give
to a question According to the text, which of the following would be a way of
assessing the reliability of a questionnaire?
Administer the same questionnaire
(or a parallel form) to the same
participants more than once.
B. Administer the
questionnaire once and
assess internal consistency.
A drawback to an openended item is that
he responses obtained may be difficult to code and analyze.
A sample consisting of participants who are not representative of
the population is a(n) ________ sample.
 Biased
The advantage of restricted items over openended items is that
 provide more control
Dr. Loo administers a long and boring questionnaire concerning
attitudes that tend to fluctuate over time. When assessing the
reliability of his questionnaire, Dr. Loo should
 avoid using test retest
To write good survey items, you should
A. use simple words rather than complex words.
B. make the stem of a question short and easy to understand but use complete sentences.
C. avoid vague questions in favor of more precise ones.
You would compare two studies in a metaanalysis if you wanted
to find out
whether the studies produced significantly different results.
Cohen’s Kappa is used to
Evaluate interater reliability
In metaanalysis, the file drawer phenomenon
A. inflates the probability of making a Type I error.
The text proposes that ________ can be used as an index of
interrater reliability.
he Pearson productmoment correlation
A(n) ________ is one who is unaware of the hypotheses being
tested in an observational study.
A. blind observer
If you can identify one behavior as more important than another
in an observational study, you can then use ________ sampling
 event
When comparing studies, looking at ________ is the preferred
technique. effect sizes
In an observational study of patients in a psychiatric ward, you
alternate 5minute periods of observation with 5minute periods of
recording behavior. This is an example of ________ sampling.
A. time
The unit of analysis in a metaanalysis should be
how variable x affects variable y.
The major advantage of using grouped data is that
convenience―a single score (e.g., the mean) can be calculated to
represent a group
1  r gives you the
coefficient of nondetermination.
A ________ presents a frequency distribution graphically as a
series of bars representing the classes whose heights indicate
the number of cases falling into each class.
A. histogram
Examining individual scores makes the most sense when
A. you have repeated measures of the same behavior. Why is it a good idea to explore your data using EDA techniques
before you conduct any statistical tests?
A. EDA can reveal defects in your data that may warrant taking
corrective action before you proceed to the inferential analysis.
B. EDA can help you determine which summary statistics are
appropriate for your data.
C. EDA may reveal unsuspected influences in your data.
An advantage of the stemplot over the histogram is that only the
stemplot allows you to
 preserve the actual score
Dummycoding variables involves
A. assigning numerical values (for example, 0 and 1) to categorical
variables.
The ________ provides an estimate of the amount of error in
prediction.
 standard error of estimate
In a stemplot of scores ranging from 11 to 83, a score of 42 would
be located at a stem value of ________.
A. 4
If your treatment did not have an effect on your dependent
variable, you can assume that the means representing each group
in your experiment A. are independent estimates of a
single population mean.
Data transformations are used to
A. adjust data to meet assumptions
of statistical tests
Serious violations of one or more of the assumptions underlying
parametric statistics
may lead you to commit a Type I error more or less often than the
stated alpha level
If your dependent variable were a dichotomous yesno response,
you could compare the proportion of subjects saying yes in the
experimental group to the proportion saying yes in the control
group by using the
A. z test for the difference between two proportions.
A onetailed test is used
if you are interested in whether the obtained value of the statistic
falls in one particular tail of the sampling distribution for that
statistic.
When an interaction is present,
C. main effects are not interpreted, because your
independent variables do not have simple effects on your
dependent variable. When the effect of one independent variable on your dependent
variable changes over levels of a second, a(n) ________ is present.
Interaction
If you want the average of 10 scores to equal 100, you can choose
any numbers you want for 9 of the scores, but the 10th score will
have to be whatever number will make the average of the scores
equal 100. Thus the _______________________ equal(s) 9.
Degrees of freedom
Because of ________, you should not conduct too many post hoc
comparisons, even if you predicted particular differences between
means.
 probability pyrmaidying
At a given significance level, a onetailed test
s more likely to detect real differences between means than is a twotailed
test.
Chapter 5
4 scales of measurement typically discussed in psychological statistics
nominal – lowest scale numbers are assigned to catergories. Numbers are assigned
to which catergory is completely arbitrary. Gender. Gives it identitiy, you can count
it.
ordinal magnitiude as well as identity. Can tell us if it has more or less. The
distance is not equal and we don’t know the distance. No math
interval equal distance they do not have a true zero point the number 0 is
arbitaray (Iq tests) temperature, calculate means and SD, nomothetic research,
ratio true 0 , highest level, all math, 0 means absence, very strong variables.
approximately interval abstract number system
identity each number has a particular meaning
magnitude numbers have an inherent order from smaller to larger
equal intervals means that the difference between numbers are on the same scale
absolute/ true zero zero point represents the absence of the property being
measured
interval scales have properties of
 identity
 magnitude
 equal distance
equal distance allows us to know how many units
 do not have a true zero point
 the number 0 is arbitrary
ratio scales have the properties of the abstract number system
 identity
 magnitude
 equal distance
 absolute/ true zero – allow us to know how many times greater one case iis
scales with absolute zero and equal interval are considered ratio scales
ordinal can be split into
ranked preferences they do not tell us how much just more or less. The things we
like more are ranked higher.
Assigned ranks in order to select a smaller subset or to show an individuals relative
placement in a larger group. Theyre considered ordinal because they have the
properties of identity and magnitude.
Interval can have a zero point but it doesn’t mean the absent of whatevers being
tested.
 cannot make statements that are multiplication or division
ratio scales
 all the properties of the abstract number system
o identity
o magnitude
o equal distance
o absolute/true zero
 all mathematical operations
 0 represents the absence of the behavior
likerttype ratings
 used in sureys where we asked to rate how much agree/disagree
 ordinal scales sometimes but sometimes interval or approx. equal interbval
 properties of identity and order  identity let us know whether we agree or disagree
 order each number represents a rating that is more or less equal ro the
others
 psychologist disagree if the scale is equal or not equal
create measures by adding up the individual likerttype ratings or calculating an
average
sum or total of item responses will give broader range of scores than the indivula
likerttype ratings.
Controversial what scale it is. Ordinal really cant see the same interval or gap
between interval repsondant looking at a scale is making equal interval
judgements. Not a true value. Mainly just between ordinal/ interval
Scale matters because ot determines the mathematical operations that are
permitted for those variables. Mathematical operations determine which statistics
can be applied to the data.
How a variable is mesasure precision
Reliability = dependability
Validity study does what it should do. Measure what we said, right test, right score.
Invalid. Not truthful or correct.
Goof assessment rools can help reject the null hypotheis or acceptance of the
research hypothsis
3 types of reliability
 interrater
o two observers watching the same behavior their scores should agree
with each other
 internal consistency
o people should respond in a consistent way to all of the questions
 test retest
o if you give people a test more than once they should get about the
same score each time
the spearman brown split half coefficient
 two random halves
 if the sum scale were reliable you would expect the two halves would have an
r close to 1.0
cronbachs alpha
not split halves, take into account all possible split halves
if a is close to 1.0 your test are reliable.
How do we establish validity
 internal
 external  construct
internal
 inside the experiment. Well designed. Free of confounds
external
 outside the experiment, generalized,
construct
 think concept. Manipulating and measuring concepts
 truly represents what you had in mind
 correlating instruments with similar measures
 predicting a specific behavior or criterion
 established by being successfully used in a wide range of studies
operational definition
 how observations are made
 what is observed
 how behavior is recorded
the more complex the behavior that is recorderd the more difficult it is to achieve
good interrater reliability.
Face to face interview
Best when you need to establish rapport with your participant.
Disadvantagethe social situation created might bias the participants responses.
Also expensive to administer.
Telephone interviews
Offers some social distance, might be easier to answer a sensitive question. Answer
participants questions and cost is less than face to face
Disadvantage easier to deny a request
Survey method
Privacy and low cost
Can choose when its convient to sit down and answer the survey
Low cost alternative
Disadvantage participants cannot ask questions and response rates are low
Three types of survey questions
 open ended
 close ended
 partially closed
open ended
 important for complete answers
 however they require more effort from the participant
close ended
 multiple choice, ranks or likerttype rating
 easy for participants  least effort
 not appropriate when the expected responses are too complex
partially closed
 good idea of the range of expected responses but want to give the
participants the opportunity to give an answer that is rare or you did not
consider
surveys
 make sure you address only one issue
 avoid bias
high cronbach alpha means that they would score high on others.
High correlation means that the same or similar responses were given both timrees
and the instrument or question is relatively stable
The measurement process
Major task
 represent variables numerically
 begin with a conceptual definitions (theory) and create operational
definitions
major considerations
 types of variables
 types of scales
 reliability and validity
conceptual variable – theoretical construct, abstact idea. Not tangable. Self esteem.
Conceptual def, in terms of measurement there isn’t a operational definition
operational def the procedure by whuch the researcher measure the construct and
or manipulare the variable.
 bridgeman: define variables in terms of the operations needed to produce
them. What is it that’s going to be needed to indicate?
In research every variable needs to be defined.
 requires to think in behavioral terms
 what do we have to fo to know we have observed something of interest.
 The more careful and complete the operational definition the more precise
the measurement of the variable will be
True score true part of the observed score. Perfect
Error score difference between observed score and true score
 error as measurement “fluctuation”
method error
 due to characteristics of the test or the testing situation. Not random. Reflects
the systematic error. (scale is always off by 5)
trait error is due to individual characteristics, random aspects of people. Unique
characteristics or experiences of the subject. (take home message) increase reliability and decreasing error
 increase sample size
 elimianate unclear questions
 use both easy and difficult questions
 minimize the effects of external events
 standardize insturctions
o clear definitions about behaviors and events to be rated
 maintain consistent scoring procedures
o provide feedback about discrepancies
 dcrease response set biases
o social desirability
a reliable score is one that is relatively free from measurement fluctuation
reliability is measured using correlation coeffiecient. Percentages. Refered to as an r.
reliability coefficients
 indicate relative consistency
 r can range from 1.0 to 1.0
 cronbachs coefficient alpha and KR20 (internal consistenct measures) range
from 0 to 1.0
o statistically possible to be less than 0 but then highly problematic
 percentages range from 0 – 100%

types of reliability
 test retest – a measure of stability, admister the same test at two different
times. R from test 1 and r from test 2
 parallel forms a measure of equivalence, two separate forms of the test to
the same people R from test 1 to test 2 rather high
 interrater reliability measure of agreement. Have raters rate behavior and then
determine the amount of agreement between them. Percentage of agreements
 internal consistency measure of how consistently each item measure the same
underlying construct correlate performance on each item with overall performance
across participants. ( cronbach coeffeicent alpha)
cronbachs coeffeicent – item variants and sums across the items. Looks like
correlation coefficient. Don’t like negative and should be between 1 and 1 and
usually upper .7 range
add to the population size that are consistant the score goes up
add items and reliability went down. In consistant set of responses. When you
respond consistant it still goes down.
Rule of thumb  min for research = .70
 use test in applied setting for clinical decisions
 why? Do the math for reliability formula + arbitrariness
why problematic?
 it is for the user to determine what amount of error variance you are willing
to tolerate
 although you can make scale more reliable by adding redundant items, real
value add might be minimal. Add good and different because don’t want to
over load participants and have a good score
test and you look at even vs odd = split half
split half – splitting an exisiting measure in half and comparing them.
Parallel forms – two stand alone versions
Cronbach – internal
Scores from a valid test should correlate with other similar variables and should
predict other variables they should predict
Validity refers to the accumulating evidence to provide scientific basis for score
meanings/interpretations accuracy of interpretation of scores.
Validity refers to the tests results or scores not the test itself
Validity ranges from low to high
Validiity must be interpreted within the testing context (people, place, time)
Validity 4 kinds
 face
 content
 criterion
o concurrent
o predictive
 construct
content
 a measure of how well the items represent the entire universe of items
 ask an expert if the items asses what you want them to do
criterion
 concurrent
o a measure of how well a test estimates a criterion
o select a criterion and correlate scores on the test with scores on the
criterion in the present
 predictive
o a measure of how well a test predicts a criterion
o select a criterion and correlate scores on the criterion in the future
construct
 a measure of how well a test assesses some underlying construct  asses the underlying construct on which the test is based and correlate these
scores with the test scores
 corrrlate new test with an established test
 show that people with and without certain traits score differently
 determine whether tasks required on test are consistent with theory guiding
test development
multitrait multimethod matrix
display assiciations between indicators measures and methods that are trying to get
at one trait and one construct.
Everything above diagnol correlates with everything below
See if there are convergents and divergents
Convergents – different methods same construct yield similar results
Discriminate validity different methods, different consturcts, yield different results.
Less convergence. Different methods shouldn’t be highly related
A valid test must be relaiable but a reliable test need not be valid.
Selecting measures
Psychometric characteristics
 type of relaibaility
 type of validity
subject measure considerations
 developed/ validated on comparable samples?
 Reading level/ motor performances/ age related
 Social desirability / response set issues
Novel measure
 with unknown / not established psychometric properties
 gather data on measure with a pilot sample first
 problematic psychometrics undermine study concludsions and inferences
chapter 6
We use inferential statistics to make an inference from the sample back to the
population. Validity of that inference depends on how representative the same or
subset is of the population from which weve drawn.
Sampling procedures
1. probability sampling random chance component
2. non probability sampling
random components give us confidence that our sample is reasonably good
representation of the population. Probability sampling
 random or chance
 every person has an equal and independent chance of being selected
non independent sampling
 refered by a friend
 similar values
different kinds of probability sampling
 simple straightforward, population is homogenous for the characteristic
 systematic not random. Every nth person from the frame
 stratified subgroups that differ substantially, treats as if they were two or
more separate populations and then randomly samples within each.
 Proportionate subgroups differ in size. Stratifies the population into
relevant subgroups then random sampling within each subgroup. Equal to
their proportion in the population
 Cluster impossible or impractical to identify every person in the sample.
 Multistage most sophisticated. Large studies. Repreentative national sample.
Zipcode…street…address.
Non probability sampling strategies
 Haphazard bias to the study. Should be avoided. “man on the street”
technique
 Convenience selects a particular group but doesn’t sample all of a
population.
 Purpose targets a particular group of people. Particular characteristics that
are hard to find generally.
Non probabililty sampling occurs when it is practically impossible to use probability
sampling. Time and expense constraints. Frequency of the behavior or
characteristics of interest is so low in the population that a more targeted strategy is
needed.
All probability sampling requires a sampling frame a population defined and
avalible through records
Lecture
Population: all possible individuals making up a group of interest in a study
Sample: a relatively small number of individuals drawn from a population for
inclusion in a study. Unreprestivive – sampling error
Subpopulation small segment of the defined population
Generalization the ability to apply findings from a sample to a larger population
Nonrandom sampling not randomly chosen
Types:
 convience  snowballing
 quota
convience – readily available or who will volunteer to participate
snowballing you select initial particiapnts who meet some kind of criteria and hen
aquire other participants through a referral from your initial participants
quota convience sample that is comprised of subgroups similar in number to the
population. Specific people are being targeted because they match a criteria
pros and cons
pros sampling from small subject pool is easier, requires less time, money and
effort
cons: nonrandom samples have less external validity, and thus the generality of
results is compromised
internet research
nonrandom because they are not representative of the population the characterisics
of knowing how to use a computer and internet limits
advantage: broader range of participants
disadvantage: generalization , external validity
lab research
 solicit , flyers etc
  subject pool
issues with subject pool
punished if they don’t participate, skew the data
field research
 take lab to particiapts (knock on doors, churches etc)
 set of situation and wait for participants
voluntary participant
ALL
 volunteer bias – bias in a sample that results from using volunteer participants
exclusively
 volunteer bias can affect internal validity
 could hurt relationship bettwen IV and DV
 only volunteers can not be generalized to population so hurts external validity
fix this by
 avoid people to not volunteer anymore
 contributing to science
 making it sound appealing. Non threatening
 avoid tasks that are stressful  state that there is importance in this research and theyd be helping science.
An example(s) of poor research in Psychology, presented in the video on Research
Methods was/were __________.
A. None of the choices are correct.
B. Dr. Bettelheim’s study of child autism
C. intensive behavioral treatments of autism
D. both B and C.
Answer Key: B
Review
Check to review before finishing (will be flagged in Table of Contents)
Question 2 of 50 Score: 0 (of possib1epoint)
Which of the following is NOT a purpose of conducting a literature search?
A. Prevent you from carrying out a study that has already been done
B. Identify questions that need to be answered
C. Provide ideas and justification for designing a study
D. none of the above
Answer Key: D
Question 3 of 50 Score: 1 (of possib1epoint) ________ theories are considered the best type of theory because they propose a new structure
to explain a phenomenon
A. Analogical
B. Descriptive
C. Empirical
D. Fundamental
Answer Key: D
Question 4 of 50 Score: 1 (of possibl1 point)
The information processing theory of memory (3 types of memory: sensory, shortterm, and
longterm) is a descriptive theory because it
A. describes the concept and explains how it affects memory ability.
B. uses an analogy to describe a concept.
C. only describes features of the concept without explaining it.
D. creates a new concept to explain the relationship between two variables.
Answer Key: C
Question 5 of 50 Score: 1 (of possibl1 point)
According to Michael Shermer's presentation of Why people believe weird things, the
problem with having a theory is that _________________________________.
A. they tend to contain individual's own personal biases
B. they can never be falsified
C. they have to be tested D. they are not easy to develop
Answer Key: A
Review
Check to review before finishing (will be flagged in Table o
More
Less