Two scientific studies activities:
Exploratory data collection and anaylsis :
is aimed at classifying behaviors, identifying potentially important variables,
identifying relationships between those variables and the behvaiors
Evaluating potential explanations for the observed relationships
Causal relationship, one variable directly or indirectly influences another.
- unidirectional, A influences B but not vice versa
- bidirectional- each variable influences each other
changes one variable accompany a change in another, but no proper testing has
been done to show that they actually influence each other.
-when changes in one variable tend to accompany a specific change in
another the variables are said to covary
correlational research- determine whether two variables covary and if so
establish the directions, magntitudes and forms of the observed relationship
- observing two values of two or more variables and determining what
relationships exist between them.
- Make no attempt to manipulate variables, observe “as is”
- Makes It possible to predict from the value of one bariable the probable
value of the other variable.
- The variable used to predict – predictor variable
- The variable whose value is being predicted is called the criterion
Two problems with this method
- third variable problem
o you want to prove variation in one of the observed variables could
only be due to the influence of the other observed variable.
o However there could be a third variable, its usually unobserved
and may influence both variables causing them to vary together
even though no direct relationship exists between them
o Must examine the effects of each potential third variable to
determine whether it does account for the observed relationship.
- Directionality problem
o The direction of causality is sometimes difficult to determine.
Reasons for choosing correlational
-manipulating the variables may be impossible or unethical
- can provide rich source of hypothesis that can be later tested
- you want to see how naturally occurring variables relate in the real world Experimental research
- incorporates a high degree of control over the variables in the study
- establish causal relationships among the variables
- manipulation of one or more independent variables and control over
manipulate the independent variable
- chosen by the experimenter
- specific conditions associated with each level is called treatments.
- By manipulating you hope to show that changes in the levels of the
independent variable cause changes in the behavior recorded
Receiving treatment- experimental group
Other group- control group
Extraneous variables – those that may affect the behavior that you wish to
investigate but are not of interest for the present experiment
-difficult or impossible to detect any effects of the independent variable
-produce chance differences in behavior across the levels of the independent
-hold these extraneous variables constant
-make sure all treatments are exactly alike
-randomize their effects across treatment
- make them even out and allow them to be unmistaken for effects across the
-random assignment- allows you to use inferential stats to evaluate the probability
with which chance alone could have produced the observed differences
strengths and limitations
experimental approach can tell you whether changes in one variable actually caused
changes in the other also ability to identify and describe casual relationships
- limitation – you cannot use the experimental method if you cannot
manipulate your hypothesized casual variables.
- The tight control over extraneous factors required to clearly reveal the
effects of the independent variable
Experiments vs demonstration
Lacks an independent variable
Exposes to just one treatment condition
Simply expose a single group to a particulat treatment and measure the behavior
Useful- this happens and not that.
Not experiments and do not show causual relationships
Internal and external validity
- internal – ability of your research design to adequately test the
hypothesis - test the hypothesis it was designed to test
- this means that the independent variable caused the observed variation
in the dependant variable
- in a correlational study changes in the value of your criterion relate solely
to changes in the value of your predictor variable
- internal validity is threatened to the extent that extraneous variables can
provide alternative explanations for the findings of the study
confounding variables- when two or more variables combine in such a way that
their effects cannot be separated.
- confounding is less problematic when the confounding variable is known
to have little or no effect on the dependant or criterion variable or when
its known effect can be taken into account in the analysis
- best thing to do is to substitute what you believe to be less serious threats
to internal validity for the more serious ones
threats to internal validity
-history (an event may occur between two different observations)
-maturation(effect of age or fatigue)
-testing(when a pretest sensitizes participants to what you are investigating )
-instrumentation (unobserved changes in criteria used by observers or instrument
-statistical regression (scores tend to be closer to the average in the population
when before they were more of outliers)
-biased selection of subjects(differ initially and that’s what effects the change,
usually happens when preexisting groups in their studies rather than assigning
subjects to groups at random)
-experimental mortality (loss of participants)
- degree that results canbe extended beyond the limited research setting
and sample in which they were obtained,
- may tell us little about how they react in the real world
- objective is to gain insight into the underlying mechanisms rather than
- may be less relevant in basic research
- becomes more relevant when the findings are expected to be applied
directly to a real world setting
setting is effected by
- research question
- gain important control over variables that could effect results - gain control over extraneous variables that could effect the dependent
- may lose generality
o might want to do it because its unethical
o expensive and time consuminh
o retaining control
o relatively realistic conditions
- designing a simulation
o observe and study carefully
o identify crucial elements
o more realistic = greater chance that it will be applicable to real
o mundane- simulation mirrors real world event
o experimental- simulation psychologically involes the participant in
- participants natural environment
- manipulate independent variable and measure a dependent variable
- has all the qualitites of a lab experiment
advantages and disadvantages
-results and be generalized to the real world
- disadvantage- little control over potential confounding variables, (low internal
- extraneous variables can obscure or distort the effects of the independent variable
in field experiments
e know that probability sampling strategies are most likely to give us a representative sample
An inclusion criterion might be declaring a major in a heavily math-oriented field (e.g.,
mathematics, computer science, physics). An exclusion criterion might be test anxiety (e.g.,
test scores might be lower not because of the stereotype threat but because of test anxiety
that decreases performance in all testing situations).
Random assignment controls extraneous factors because it randomly distributes personal
characteristics that can influence outcome across conditions.
There are two ways to randomly assign participants to experimental conditions:
1. Free random assignment
2. Matched random assignment.
With free random assignment, the experimenter uses a random number table or a random-
number generator on a calculator or computer to assign participants to groups. There is no
attempt to measure and use personal characteristics as part of the random assignment
process. With matched random assignment, information about subject characteristics is collected and
used to identify similar participants. After the match is made, participants are then randomly
assigned to groups. This strategy insures an equal distribution of critical personal
characteristics across experimental conditions.
Bias can be introduced into studies by both the experimenter and the participant. Direct
knowledge of the study hypothesis, the nature of the experimental manipulation, and group
assignment can lead to subtle differences in the ways that experimenters and participants
interact in the research setting. Restricting knowledge of the experiment through "blind"
procedures can help to eliminate this bias.
n a single-blind procedure, a laboratory assistant who does not know the study hypothesis
administers the experimental manipulation. The laboratory assistant also does not know the
experimental condition to which the participant was assigned. Having a naive intermediary
between the experimenters who designed the study and the research participants prevents
experimenter expectancies from influencing study results.
In a double-blind procedure, both the person administering the experimental manipulation and
the participant do not know the study hypothesis and group assignment. The prototypic
double-blind study is a randomized study of medication. The participants receive either the
active drug or a pill that looks, smells, and tastes exactly like the drug but without the active
ingredient. Neither the experimenter nor the participants knows whether the active drug or
placebo is being administered until after the study is over.
Performance is compared within individual participants. Order and sequence effects are major
sources of error in within-subjects designs.
Order effects produce changes in performance based on the order of the condition in the
experiment and not the manipulation in the specific condition. Practice effects can be
considered order effects. In cognitive experiments, performance is usually lower on the first
task because participants are unfamiliar with the setting. Once the participants become
familiar with what is required, performance increases.
Fatigue effects are also order effects. Performance is worse in later conditions because
participants are tired. Sequence effects are produced by characteristics of the experimental
manipulation. For example, in a study of perception of weight, participants will judge a weight
lighter if it follows a heavy rather than a light weight. The same occurs in reverse; participants
will judge weights as heavier if they follow a light rather than a heavy weight. Sequence
effects are caused by an interaction between order and specific aspects of the manipulation.
We always try to build controls that minimize or eliminate confounds/threats to
validity into our studies.
The most general control in experimental research is adequate preparation of the
Extraneous variables can be controlled by carefully selecting who is in our study
through inclusion and exclusion criteria and by randomly assigning participants to
Single-blind and double-blind procedures can help to control for experimenter bias. Control groups in between-groups experimental designs should be as similar as
possible to experimental groups.
Counterbalancing in as effective control in within-subjects designs.
Hypothesis testing is one of the most important concepts in Statistics.
This is how we decide if:
Effects actually occurred.
Treatments have effects.
Groups differ from each other.
One variable predicts another.
Null hypothesis – nothing happened
You test your sample statistic against the value based on the Null Hypothesis sampling
Your sample statistic and Null Hypothesis sampling distribution values are close.
You conclude that they are not different; you did not find an effect in your study.
Alternative hypotheisis – something happened
You test your sample statistic against the value based on the Null Hypothesis sampling
Your sample statistic and Null Hypothesis values are not close.
You conclude that they are different; you found an effect in your study; hooray!
1. You come up with your hypothesis (for example - college students sleep less than
2. You generate a sample (pick a set of college students).
3. You calculate your summary statistics (for example, the mean and SD of number
of hours that college students sleep per night).
4. You determine the statistical test that will compare your summary statistic against
the value determined by your Null Hypothesis. (You would use the single sample t-test
for college students’ sleep.)
5. You calculate the test statistic using your summary statistics. The formula for
the test statistics is different for each type of test but the basic concept is the same.
You calculate how far your sample is from the Null Hypothesis taking into account that
sample values of a statistic vary by chance when smaller samples are taken from a
larger population. The SE tells us how much they vary.
6. You derive the appropriate sampling distribution - or refer to one already listed in
the tables in your statistics book. Your computer program can also give you this
information. 7. You choose the cut-off value on your sampling distribution that tells you that
your sample statistic is very far from the Null Hypothesis and thus not likely. We call
this cut-off value our alpha level or significance level (more about alpha later).
8. You decide whether to reject the Null Hypothesis or fail to reject the null. You
do this by comparing your test statistic to the cut-off value.
9. You draw your conclusion. If you reject the Null Hypothesis, you say that your
result is statistically significant. This simply means that it did not happen by luck or
chance. If you fail to reject the null, you conclude that you did not find an effect or
difference in this study.
You can make an error or two when you test hypotheses. You might say things are different
when they are not. You may miss a relationship that really exists. These are called Type
Iand Type II errors, respectively.
Power is the probability of correctly rejecting a false Null Hypothesis.
Many experts recommend that you use a power of .80. This means that you have an
80% chance of finding a difference when you really want to find it. You don't want to miss a
real difference or correlation. (Bad - missing a difference is called a Type II error with
probability equal to Beta).
Power is equal to 1 - Beta.
The test might say there is a difference when there is not one.
(Bad - an error called Type I error whose probability equals your alpha rate: .05 or .01).
Depending on conditions, you may have a good or bad chance of finding the desired result. To
increase power you can:
1. Try to increase the effect size or the strength of the relationship.
2. Decrease experimental error.
3. Use a higher alpha level (say .05 as compared to .01). Note this increases
power but also Type I error.
4. Increase sample size.
5. Use matched samples or covariance technique
We always test a null hypothesis against an alternative/research hypothesis.
If a sample is close to null, we conclude that nothing happened in the study. If
a sample is far away or different from the null, we reject the null hypothesis
and conclude that something happened.
The logic of hypothesis testing is counterintuitive (or backwards). We test
whether nothing happened (our sample value is close to the null) in order to
conclude that something happened.
There are two types of error in hypothesis testing (Type I and Type II). Type I
errors occurs when we conclude that there is a difference when there is not. Type II errors occur when we conclude that there is no difference when there
Statistical power is the probability of correctly detecting a difference between
probability of detecting a true difference., we want to maximize the
Name and identify the 4 major types of validity
Internal- is the independent variable responsible for the observed changes in
the dependednt variables (objective)
- infer causation
- function of procedures and study design
- confounds – occurs when two potential effective variables are
allowed to covary simultaneously
o could be as responsive to dependent , shows up at the same
- need high level of constrait
- adequatlely controlled concerns
Statistical – are the statistical tests accurate? (objective)
External- do the results apply to the broader population (subjective)
Construct- is our theory the best explanation for the results (subjective) the
most subjective, accumulation of evidence
Name and define major confounding variables
Regression to the mean
Sequence effects Diifusion of treatments
Explain why some forms of study validity are considered moe objective and
others are more subjective
Reduced to the rules that exisist. Rules are clearer and more agreed upon in
internal rather than external.
Describe specific threats to validity in several additional sample studies and
explain how the study can be redesigned to reduce threats
Maturation- passage of time, development
History- competing event. Another explanation, covary
Testing/ practice effects- tested once and tested twice, changed in
measurement process. Might remember the questions, taking the
measures in two points in time
Regression to the mean- leveling effect. Extreme scores, rarely end up
extreme again. mid range not too much change. Rare to see high scores
stay high and low scores stay low.
Selection bias- choice of participants, limited.
Attrition(front end)/ differential mortality (back end)- loose some
people. Loose some people moved, no longer want to participant, death.
Lost certain kinds of people.
Within a small environment
Diffusion of treatment- students talk and are in two different groups,
gets benefit of treatment. Virtue of hanging out.
Compensatory rivalry- no treatment, shows that he doesn’t need the
treatment. Control group. He gets motivated to do better than the
Conclude that there was no effect because theres no difference because
everyone was getting treatment blah blah sharing info
Resentful demoralization- opposite of compensatory effect. Apart of
control, theyre motivated to do worse. Mad. Bummed. Less desireable
treatment. Differential attrition/ morality- group being measured is different. Lost
Participants may be aware
- demand characteristics
- placebo effects
experimenters are also active
subjects may try and behave the way they think they are suppose to
- hawthorne effect
- outguessing the experimenter
- know youre in experiment and reacting to the awareness
- respond to subtle cues about what is expected
- also occurred in hawthorne
- feedback, paid and production
- expectations that the treatment will work
- demand characteristics
- subtle biases in observations, recording, measurement
- unlike fraud this behavior is out of our awareness
- why did it work
- manipualate what we intended to
- does dependent measure get at what we say it does
- is our theory the best explanation for the results?
Inadequate preoperational explication of constucts
- did you think through the theory and definitions of constucts
before you measured
mono operation bias
- self report meaures, only used interviews, etc. operationalized
in a single fashion. Didn’t adequately express
- process measures if trying to link treatment to effect
- did toy manipulate what you said - did dependent variables indented to measure same construc
actually do so?
- Include alternative measures that should not be effected by the
- Is there too much overlap on irrelevant factors with your
dependent measures? Mono-operation bias
Did my treatment cause the outcome?-
Observational designs do not involve maone nipulating the independent variable
Behavioral catergories – define what is being recorded.
Make sure they are defined well and not ambigious, cultural traditions may not be
- become familiar with the behaviors and make a list
- do preliminary observations
- literature research
frequency method- within a time period
duration method- how long a behavior lasts
interval – divide up observation period into time intervals and record a behavior
that occurs within that time period. (short enough for one behavior to occur )
complexity – how to make your observations
- scan group for specific time
- alternate between periods of observation and recording
- single subject for observation over a given time period .
- repeat for other individuals
- observe only one behavior
- use recording devices
- multiple observers watch the video independently
- hide a camera better than you can hide yourself.
Use audio recorders instead of taking notes
- eyes are focused on the subject
- disadvantage: may disturb subjects
reliability of observations - disagreement may happen if you have not clearly defined the behavioral
- interrater reliability provides an empirical index of observer agreement
o establishing this helps ensure their accurate and reproduceable
- simplest way to asses rinterrater reliability is to evaluate percent agreement
o high as possible
o 70% is acceptable
o if defined as exact match then the percent agreement underestimates
o only gives raw estimate of agreement
o may have extremely high levels of chance agreement in which percent
- assesses amount of agreement that would be expected by chance
- need to determine
o proportion of actual agreement
o agreement expected by chance
- 1 step is to tabulate in a confusion matrix
- 2 step compute
- 3 find the proportion of expected agreement by multiplying corresponding
row and column totals
- any value of .70 or greater indicates acceptable reliability
pearsons product moment correlation
- if observers agree pearson R will be strong and positive
- they can be highly correlated even when obserbers disagree
- magnitudes of the recorded scores increase and decrease similarily
intraclass correlation coefficient
- reliability on observations scaled on an interval or ratio scale of
- when the observers know the goal of a study or the hypothesis the
observations are influenced by this information
- use a blind observer
- when observers interpret what they see rather than simply recording
quantitative and qualitative
- quantitative data is expressed numerically
- qualititative data is written records of observed behavior
o cannot apply standard descriptive and inferential statistcs to your
- do not alter natural behaviors of subjects - be hidden, habituate the subjects to your presence (video as well)
- behavior in the real world
- high external valididty
- cant use naturalistic observation to investigate the underlying causes of the
- requires you to be there, engaged the entire time.
- researcher becomes immersed in the behavioral or social system being
- primarily to study and describe the functioning of cultures through a study of
social interactions and expression between people and groups
- done in field setting
- participant observation – part of the group
- non participant observation- serve as a nonmember of the group
- minimize people altering behavior by training participant observers not to
interfere or use observers that are blind
- remove problem of reactivity by observing covertly
- covert participant- part of the group but disclose status
- gaining access might be hard you need to get past the gatekeepers who are
the protectors of the group.
- Another entry into the group is to use guides and informants who convince
the gatekeepers that your aims are legit and study is worthwhile.
- First step to analyzing is to do an initial reading of field notes
- Second step to analyzing is to code any systematic patterns
- Ethnographic is purely descriprive in nature, we cannot explain why
- identitifying and measuring interpersonal relationships within a group
- use sociometry as the sole research tool to map interpersonal relationships
- sociogram the choices of friends
- observe a single case
- not experimental design
- no manipulation of IV
- cannot determine causes
- nonexperimental strategy that involves studying existing records
- all factors pertaining to observational research apply to archival research
- gain access to archived material
- practical matter is the completeness of the records
- purely descriptive
- maybe indentify interesting trends or correlations
- cannot establish causual relationships
- analyze a written or spoken record
- occurrence of catergories or events, pauses, negative comments, behavior etc - usually use archival sources for analysis
- example is court proceedings
- all factors that are observational research apply to content
- observational technique
o clear set of rules
o info assigned into catergories
o include articles that are for/against personal favor
- should have generality
o fit within theoretical, empirical or applied context.
- When performing you need clear operational definitions of terms
o Materials need to be analyzed before you develop catergories
o The recording unit is the element of the materials that you are going
o Context unit- context within which the word was used
- Who will do the analysis
o May be effected by bias
o use blind observer
o use 1+ observer to evaluate intterater reliability
o content alaysis of a biased sample may produce biased results
- content analysis is purely descriptive
- durability can be a problem
- invalidated over time
- conclusion may not accurately reflect the strength of the relationships
examined in your review
- meta analysis – set of statistical procedures that allow you to combine or
- form of archival research
- doing meta analysis on meta analysis is called second order meta analysis
- 3 steps
o identify relevant variables
o locate relevant research to review
o conduct the proper meta analsysis
- step 1
o identify variables
o focus on only those related to your topic
o what variables to record
o driven by the research question
o info needed is dependent on the meta analysis technique you use
- step 2:
o locate research to use
o file drawer phenonmenon inflates the type 1 error
how to deal attempt to uncover those studies that never reached
estimate the extent of the impact on the file drawer
phenonmenon on your analysis
o done by determining the number of studies that
must be in the file drawer before serious biasing
- step 3
o apply technique
o 1 technique shows you can compare studies
o doing a meta analysis comparing studies is analogous to conducting
an experiment using human or animal subjects
o second technique combine studies to determine the average effect of a
variable across studies
o comparing effect sizes is more desirable than looking at p
p value only tells you likkihood of making type 1 error
- drawbacks to meta
o published research may vary
o research in new areas is rejected from refereed journals
o quality ratings would be made twice
once after reafing the method section alone
then reading the method and results sections together
o common critizim is that its difficult to understand how studies with
widet varying materials measure and methods can be compared
o core issue is whether or not differing methods are related to
differenct effect sizes
in a field survey you directly ask people their behavior
you can draw an inference about the factors of underlying behavior
one major ethical concern whether and how you will maintain the anonymity of
your participants and the confidentiatlity of their responses.
- clearly define the topic of your study
- keep it focused because too much in a survey or a long survet can confuse
- demographics are used as predictor variables
- specifically measure voter preference – criterion variable
- administer your questionnaire to a piolet group make sure relaibale and
open ended questions - his or her own words
- drawback – may not understand exactly what you are looking for or omit
- Can also make sumerizing data difficult
- provide a limited number of specific response alternatives
- control the participants range of responses
- easier to summarize and analyze
- not as rich in information
partically open ended items
- resemble restricted items but provide an additional, “other” catergory and an
opportunity to give answers not listed.
- Helps respondents separate the question from the response catergories that
- Make any special instuctions intended to clarify a question a part of the
- Put check boxes blank spaces or numbers.
- Place all alternatives in a single column
- scales with fewer than 10 points also are frequently used but you should not
go below 5 points
- end points labeled – anchors keep interpretation from drifting
- all points are labled provides moe accurate info
- reasonable compromise is the ends and middle labled.
- The psychological phenonmenon underlying the scale and the scale itself
- Likert scale- agreement – disagreement
Assembling your questionnaire
- coherent visually pleasing format
- demographic items should not be placed first
- interesting and engaging
- apply to everybody
- be easy
- order affects answers only when people are poorly educated
- place objectionable questions after less objectionable ones
- use graphics
- verbal/ graphical relate to how your questions are worded and presented
- mail directly to the participant
- nonresponse bias – fail to complete questionnaire
- develop strategies to increase return rate
- multiple contacts
- include small token of appreciation - less money tends to work better
- lower cost
- consider this way first
- distributed via email
- short and simple
- quick and easy and have large data set
- internet may not be representative of general population
- contact by phone
- have an interviewer ask questions or a robot
- touch tone relephone to respond
group administered surveys
- people may participate because little effort is required
- not treated as serious when in a group
- not ensure anonymity
- right to decline participation may be harder
face to face interviews
- directly speaking
- in a structured interview you prepare questions
- unstructured you have a general idea but no sequence of questions
- structured- all participants asked the same thing same order
o easier to summarize and analyze
o may miss some important info having a highly structured interview
- unstructured may be hard to code later on
- experimenter bias and demand characteristics become a problem
- run a piolet test
- admister once then allow time then again
- consider how long to wait
- too short may result in participants remembering questions and answers
- leads to artificially high level of test-retest reliability
- wait too long it could be low
- test restest may be problematic when
o measuring ideas fluctuate with time
o issues for which individuals are likely to remember their answers on
the firs ttesting
o questionaires that are long and boring
- parallel forms
o must be equivalent
o same number of items and the same response format
o eliminate possibility that rapid chaning attitudes will result in low
- sing administration o split half- split in half and two scores are graded
o works best when limited to a specific area
o each score is based on a limited set of items which can reduce
o don’t do if its not clear how the splitting should be done
o some do the odd even split
o apply the kuder-richardson formula
o the higher the number the greater the reliability
o .75 is moderate
o likert formar- coefficient alpha is used
- inceasing reliability
o increase the number of items on your questionnaire
o standardize administration procedures
o score carefuly
o clear well written and appropriate
- validity of the questionnaire
o content validity- assesses whether the questions cover the range of
o construct validity- stablished by showing the questionaires results
agree with predictions based on theory
o criterion related validity- coorelating the results with those of another
Concrurrent validity- same dimension administered at the
Predictive validity- correlating with some behavior that would
be expected to occur
- representative sample- closely matches the chatacteristics of the population
- random sampling every member has an equal chance of appearing in the
- simple random sampling- randomly selecting a certain number of individuals
from the population
- random does not gaurentee a representative sample
o combat this by selecting a large sample
- representative sample
o dividing the population into segments
o selecting an equal size from the segment
- proportionate sampling
o the proportions of people in the population are reflected in your
- systematic sampling
o used in conjunction with stratified sampling
o every xth element after a random start
- cluster sampling
o basic sampling unit is a group of participants rather than the
individual participant o saves time
o cost effective
o multistage sampling
identify a large cluster and randomly select among them
- economic sample- enough participants to ensure a valid survey and no more
- the amount of acceptable error and the expected magnitude of the
- the deviation of sample characteristics from those of the population are
called sampling error.
- Look at literature and see what margin of error was used
- Magnitiude of the differences you expect to find
- Design a small pilot
Unstacked format- create separate colums for the scores from each treatment
Stacked- one column for all treatments, column for the treatment levels, column for
Quantitative independent variable can just be put as the number however a
qualitiative indepenedent variable must be assigned a number for each level aka
- taking an average = one score that characterizes an entire distribution
- may not represent the performance of the individual subject
- curve resulting from plotting averaged data may not reflect the true nature of
the psychological phenonmenon being studied.
- makes the most sense
- reflects the effect of the independent variable more faithfully than data
averaged over the entire group
look at both the grouped and individual data
- represents data in a 2-D space
- horizontal axis is the x axis- independent variable
- vertical axis is the y axis and that’s the dependent variable
- length of the bar represents the value of the dependent variable
- error bars- the precision of the estimate in the form of error bars
o error bars show the variability of scores around the estimate
- a bar graph is the best method of graphing when your independent variable
- x axis is catergorical and qualitiative
line graphs - works when x axis is contimuous and quantitiative
- positively accelerated when the curve is flat at first and becomes
progressively steeper along the x axis
- negatively accelerated when the curve is steep at first and then becomes
progressively flatter and levels off at a max or min
- monotonic is when its uniformly increasing or decreasing
- nonmonotomic is that a function contains reversals in all directions
- correlational strategy
- line of best fit
- include the equation for this line and the coefficient of correlation
- helpful for when you calculate a measure of correlation
- for data in the form of proportions or percentages
- if a piece is pulled out its called and exploded pie graph and it emphasizes the
proportion of time devoted to the subject
- consists of a set of mutually exclusive catergories into which you sort the
actual values observed in your data together with a count of the number of
data value falling into each catergory.
- resemble bar graphs
- are drawn touching each other
- y axis is a frequency, typically a mean score
- simplifys the job of displaying distributions
- easy to construct and have advantage over histograms and tables of
preserving all the actual values present in the data
- inherently create class widths of ten
- not useful when the sets become too large
- has a long tail trailing off in one direction and a short tail in the other.
- Positively skewed- long tail goes to the right
- Negatively skewed- long tail goes off to the left, downscale
Normal distribution- symmetric and hill shaped- bell curve
Measures of the center
o most frequent
o bimodal- have 2 modes
o nominal and ordinal scale
o middle score
o order from highest to lowest
o two middle scores you take the average
o ordinal scale
- mean o senstitive to distance between scores
o interval or ratio scale
o normally distributed use mean as measure of center
o negatively skewed the mean underestimates the center
o positively skewed the mean overestimates the center
o neither mean or median will accurately represent the center if your
distribution is bimodal
measure of spread
o simplest and least informative
o does not take into account magnitude of the scores between the
o very sensitive to outliers
- interquartile range
o order the scores
o divide into 4 equal parts
o less sensitive to extreme scores
o average squared deviation from the mean
- standard deviation
o most popular measure of spread
5 number summary st
- minimum, the 1 quartile, the median, the third quartile and the maximum
- interquartile range is Q3 –Q1
- pearsons product moment correlation coeffiecient or pearson r
o scale your independent measures on an interval or ratio scales
o pearson r can range from +1 through 0 to -1
o positive correlation represents a direct relationship
o negative correlation indicates an inverse relationship
o correlation of 0 says no relationship exisits
o magnitiude of the correlation coefficient tells you the degree of linear
o o means not relationship exsists
o Both+1 and -1 represtnet perfect linear relationships
o Parabola shape is called a curvilinear relationship
- Point biserial correlation
o Because one variable is continuous and the other dichotomous this
o Dichotomous relationship is dummy coded
o Magnitiude partyl depends on the proportion of participants falling
into each of the dichotomous catergories
o If the number of participants in each category are not equal then the
maximum attainable value for the point-biserial correlation is less
than + or – 1
- Spearmen rank order correlation o Either when your data are scaled on an ordinal scale or when you
want to determine whether the relationship between variables is
- Phi coefficient
o Both variables being correlated are measured on a dichotomous scale.
- Bivariate regression
o Find the straight line that best fits the data plotted on a scatter plot
o The best fitting straight line is the one that minimizes the sum of the
squared distances between each data point and the line as measured
along the y axis
- The coefficient of determination
o The square of the correlation coefficient
- According to the text, you can increase the reliability of your questionnaire by
B. standardizing administration
C. writing clear, appropriate
In ________ sampling, you identify naturally occurring groups (for
example, classes in a school) and sample some of those groups.
If you decide to assess reliability with multiple tests and use
alternate forms of your questionnaire, you would then use ________
to assess reliability.
- parallel forms
Labeling each point on a scale versus labeling only the end points
Usually does not significantly affect the responses participants give
to a question According to the text, which of the following would be a way of
assessing the reliability of a questionnaire?
Administer the same questionnaire
(or a parallel form) to the same
participants more than once.
B. Administer the
questionnaire once and
assess internal consistency.
A drawback to an open-ended item is that
he responses obtained may be difficult to code and analyze.
A sample consisting of participants who are not representative of
the population is a(n) ________ sample.
The advantage of restricted items over open-ended items is that
- provide more control
Dr. Loo administers a long and boring questionnaire concerning
attitudes that tend to fluctuate over time. When assessing the
reliability of his questionnaire, Dr. Loo should
- avoid using test re-test
To write good survey items, you should
A. use simple words rather than complex words.
B. make the stem of a question short and easy to understand but use complete sentences.
C. avoid vague questions in favor of more precise ones.
You would compare two studies in a meta-analysis if you wanted
to find out
whether the studies produced significantly different results.
Cohen’s Kappa is used to
Evaluate interater reliability
In meta-analysis, the file drawer phenomenon
A. inflates the probability of making a Type I error.
The text proposes that ________ can be used as an index of
he Pearson productmoment correlation
A(n) ________ is one who is unaware of the hypotheses being
tested in an observational study.
A. blind observer
If you can identify one behavior as more important than another
in an observational study, you can then use ________ sampling
When comparing studies, looking at ________ is the preferred
technique. -effect sizes
In an observational study of patients in a psychiatric ward, you
alternate 5-minute periods of observation with 5-minute periods of
recording behavior. This is an example of ________ sampling.
The unit of analysis in a meta-analysis should be
how variable x affects variable y.
The major advantage of using grouped data is that
convenience―a single score (e.g., the mean) can be calculated to
represent a group
1 - r gives you the
coefficient of nondetermination.
A ________ presents a frequency distribution graphically as a
series of bars representing the classes whose heights indicate
the number of cases falling into each class.
Examining individual scores makes the most sense when
A. you have repeated measures of the same behavior. Why is it a good idea to explore your data using EDA techniques
before you conduct any statistical tests?
A. EDA can reveal defects in your data that may warrant taking
corrective action before you proceed to the inferential analysis.
B. EDA can help you determine which summary statistics are
appropriate for your data.
C. EDA may reveal unsuspected influences in your data.
An advantage of the stemplot over the histogram is that only the
stemplot allows you to
- preserve the actual score
Dummy-coding variables involves
A. assigning numerical values (for example, 0 and 1) to categorical
The ________ provides an estimate of the amount of error in
- standard error of estimate
In a stemplot of scores ranging from 11 to 83, a score of 42 would
be located at a stem value of ________.
If your treatment did not have an effect on your dependent
variable, you can assume that the means representing each group
in your experiment A. are independent estimates of a
single population mean.
Data transformations are used to
A. adjust data to meet assumptions
of statistical tests
Serious violations of one or more of the assumptions underlying
may lead you to commit a Type I error more or less often than the
stated alpha level
If your dependent variable were a dichotomous yesno response,
you could compare the proportion of subjects saying yes in the
experimental group to the proportion saying yes in the control
group by using the
A. z test for the difference between two proportions.
A one-tailed test is used
if you are interested in whether the obtained value of the statistic
falls in one particular tail of the sampling distribution for that
When an interaction is present,
C. main effects are not interpreted, because your
independent variables do not have simple effects on your
dependent variable. When the effect of one independent variable on your dependent
variable changes over levels of a second, a(n) ________ is present.
If you want the average of 10 scores to equal 100, you can choose
any numbers you want for 9 of the scores, but the 10th score will
have to be whatever number will make the average of the scores
equal 100. Thus the _______________________ equal(s) 9.
Degrees of freedom
Because of ________, you should not conduct too many post hoc
comparisons, even if you predicted particular differences between
- probability pyrmaidying
At a given significance level, a one-tailed test
s more likely to detect real differences between means than is a two-tailed
4 scales of measurement typically discussed in psychological statistics
-nominal – lowest scale numbers are assigned to catergories. Numbers are assigned
to which catergory is completely arbitrary. Gender. Gives it identitiy, you can count
-ordinal- magnitiude as well as identity. Can tell us if it has more or less. The
distance is not equal and we don’t know the distance. No math
-interval- equal distance they do not have a true zero point the number 0 is
arbitaray (Iq tests) temperature, calculate means and SD, nomothetic research,
-ratio- true 0 , highest level, all math, 0 means absence, very strong variables.
-approximately interval abstract number system
identity- each number has a particular meaning
magnitude- numbers have an inherent order from smaller to larger
equal intervals- means that the difference between numbers are on the same scale
absolute/ true zero- zero point represents the absence of the property being
interval scales have properties of
- equal distance
equal distance allows us to know how many units
- do not have a true zero point
- the number 0 is arbitrary
ratio scales have the properties of the abstract number system
- equal distance
- absolute/ true zero – allow us to know how many times greater one case iis
scales with absolute zero and equal interval are considered ratio scales
ordinal can be split into
ranked preferences- they do not tell us how much just more or less. The things we
like more are ranked higher.
Assigned ranks- in order to select a smaller subset or to show an individuals relative
placement in a larger group. Theyre considered ordinal because they have the
properties of identity and magnitude.
Interval can have a zero point but it doesn’t mean the absent of whatevers being
- cannot make statements that are multiplication or division
- all the properties of the abstract number system
o equal distance
o absolute/true zero
- all mathematical operations
- 0 represents the absence of the behavior
- used in sureys where we asked to rate how much agree/disagree
- ordinal scales sometimes but sometimes interval or approx. equal interbval
- properties of identity and order - identity- let us know whether we agree or disagree
- order- each number represents a rating that is more or less equal ro the
- psychologist disagree if the scale is equal or not equal
create measures by adding up the individual likert-type ratings or calculating an
sum or total of item responses will give broader range of scores than the indivula
Controversial what scale it is. Ordinal- really cant see the same interval or gap
between interval- repsondant looking at a scale is making equal interval
judgements. Not a true value. Mainly just between ordinal/ interval
Scale matters because ot determines the mathematical operations that are
permitted for those variables. Mathematical operations determine which statistics
can be applied to the data.
How a variable is mesasure- precision
Reliability = dependability
Validity- study does what it should do. Measure what we said, right test, right score.
Invalid. Not truthful or correct.
Goof assessment rools can help reject the null hypotheis or acceptance of the
3 types of reliability
o two observers watching the same behavior their scores should agree
with each other
- internal consistency
o people should respond in a consistent way to all of the questions
- test retest
o if you give people a test more than once they should get about the
same score each time
the spearman- brown split half coefficient
- two random halves
- if the sum scale were reliable you would expect the two halves would have an
r close to 1.0
not split halves, take into account all possible split halves
if a is close to 1.0 your test are reliable.
How do we establish validity
- external - construct
- inside the experiment. Well designed. Free of confounds
- outside the experiment, generalized,
- think concept. Manipulating and measuring concepts
- truly represents what you had in mind
- correlating instruments with similar measures
- predicting a specific behavior or criterion
- established by being successfully used in a wide range of studies
- how observations are made
- what is observed
- how behavior is recorded
the more complex the behavior that is recorderd the more difficult it is to achieve
good interrater reliability.
Face to face interview
Best when you need to establish rapport with your participant.
Disadvantage-the social situation created might bias the participants responses.
Also expensive to administer.
Offers some social distance, might be easier to answer a sensitive question. Answer
participants questions and cost is less than face to face
Disadvantage- easier to deny a request
Privacy and low cost
Can choose when its convient to sit down and answer the survey
Low cost alternative
Disadvantage- participants cannot ask questions and response rates are low
Three types of survey questions
- open ended
- close ended
- partially closed
- important for complete answers
- however they require more effort from the participant
- multiple choice, ranks or likert-type rating
- easy for participants - least effort
- not appropriate when the expected responses are too complex
- good idea of the range of expected responses but want to give the
participants the opportunity to give an answer that is rare or you did not
- make sure you address only one issue
- avoid bias
high cronbach alpha means that they would score high on others.
High correlation means that the same or similar responses were given both timrees
and the instrument or question is relatively stable
The measurement process
- represent variables numerically
- begin with a conceptual definitions (theory) and create operational
- types of variables
- types of scales
- reliability and validity
conceptual variable – theoretical construct, abstact idea. Not tangable. Self esteem.
Conceptual def, in terms of measurement there isn’t a operational definition
operational def- the procedure by whuch the researcher measure the construct and
or manipulare the variable.
- bridgeman: define variables in terms of the operations needed to produce
them. What is it that’s going to be needed to indicate?
In research every variable needs to be defined.
- requires to think in behavioral terms
- what do we have to fo to know we have observed something of interest.
- The more careful and complete the operational definition the more precise
the measurement of the variable will be
True score- true part of the observed score. Perfect
Error score- difference between observed score and true score
- error as measurement “fluctuation”
- due to characteristics of the test or the testing situation. Not random. Reflects
the systematic error. (scale is always off by 5)
trait error- is due to individual characteristics, random aspects of people. Unique
characteristics or experiences of the subject. (take home message) increase reliability and decreasing error
- increase sample size
- elimianate unclear questions
- use both easy and difficult questions
- minimize the effects of external events
- standardize insturctions
o clear definitions about behaviors and events to be rated
- maintain consistent scoring procedures
o provide feedback about discrepancies
- dcrease response set biases
o social desirability
a reliable score is one that is relatively free from measurement fluctuation
reliability is measured using correlation coeffiecient. Percentages. Refered to as an r.
- indicate relative consistency
- r can range from -1.0 to 1.0
- cronbachs coefficient alpha and KR-20 (internal consistenct measures) range
from 0 to 1.0
o statistically possible to be less than 0 but then highly problematic
- percentages range from 0 – 100%
types of reliability
- test- retest – a measure of stability, admister the same test at two different
times. R from test 1 and r from test 2
- parallel forms- a measure of equivalence, two separate forms of the test to
the same people R from test 1 to test 2 rather high
- inter-rater reliability- measure of agreement. Have raters rate behavior and then
determine the amount of agreement between them. Percentage of agreements
- internal consistency- measure of how consistently each item measure the same
underlying construct- correlate performance on each item with overall performance
across participants. ( cronbach coeffeicent alpha)
cronbachs coeffeicent – item variants and sums across the items. Looks like
correlation coefficient. Don’t like negative and should be between -1 and 1 and
usually upper .7 range
add to the population size that are consistant the score goes up
add items and reliability went down. In consistant set of responses. When you
respond consistant it still goes down.
Rule of thumb - min for research = .70
- use test in applied setting for clinical decisions
- why? Do the math for reliability formula + arbitrariness
- it is for the user to determine what amount of error variance you are willing
- although you can make scale more reliable by adding redundant items, real
value- add might be minimal. Add good and different because don’t want to
over load participants and have a good score
test and you look at even vs odd = split half
split half – splitting an exisiting measure in half and comparing them.
Parallel forms – two stand alone versions
Cronbach – internal
Scores from a valid test should correlate with other similar variables and should
predict other variables they should predict
Validity refers to the accumulating evidence to provide scientific basis for score
meanings/interpretations accuracy of interpretation of scores.
Validity refers to the tests results or scores not the test itself
Validity ranges from low to high
Validiity must be interpreted within the testing context (people, place, time)
Validity 4 kinds
- a measure of how well the items represent the entire universe of items
- ask an expert if the items asses what you want them to do
o a measure of how well a test estimates a criterion
o select a criterion and correlate scores on the test with scores on the
criterion in the present
o a measure of how well a test predicts a criterion
o select a criterion and correlate scores on the criterion in the future
- a measure of how well a test assesses some underlying construct - asses the underlying construct on which the test is based and correlate these
scores with the test scores
- corrrlate new test with an established test
- show that people with and without certain traits score differently
- determine whether tasks required on test are consistent with theory guiding
multitrait- multimethod matrix
display assiciations between indicators measures and methods that are trying to get
at one trait and one construct.
Everything above diagnol correlates with everything below
See if there are convergents and divergents
Convergents – different methods same construct yield similar results
Discriminate validity- different methods, different consturcts, yield different results.
Less convergence. Different methods shouldn’t be highly related
A valid test must be relaiable but a reliable test need not be valid.
- type of relaibaility
- type of validity
subject measure considerations
- developed/ validated on comparable samples?
- Reading level/ motor performances/ age related
- Social desirability / response set issues
- with unknown / not established psychometric properties
- gather data on measure with a pilot sample first
- problematic psychometrics undermine study concludsions and inferences
We use inferential statistics to make an inference from the sample back to the
population. Validity of that inference depends on how representative the same or
subset is of the population from which weve drawn.
1. probability sampling- random chance component
2. non probability sampling
random components give us confidence that our sample is reasonably good
representation of the population. Probability sampling
- random or chance
- every person has an equal and independent chance of being selected
non- independent sampling
- refered by a friend
- similar values
different kinds of probability sampling
- simple- straightforward, population is homogenous for the characteristic
- systematic- not random. Every nth person from the frame
- stratified- subgroups that differ substantially, treats as if they were two or
more separate populations and then randomly samples within each.
- Proportionate- subgroups differ in size. Stratifies the population into
relevant subgroups then random sampling within each subgroup. Equal to
their proportion in the population
- Cluster- impossible or impractical to identify every person in the sample.
- Multistage- most sophisticated. Large studies. Repreentative national sample.
Non probability sampling strategies
- Haphazard- bias to the study. Should be avoided. “man on the street”
- Convenience- selects a particular group but doesn’t sample all of a
- Purpose- targets a particular group of people. Particular characteristics that
are hard to find generally.
Non probabililty sampling occurs when it is practically impossible to use probability
sampling. Time and expense constraints. Frequency of the behavior or
characteristics of interest is so low in the population that a more targeted strategy is
All probability sampling requires a sampling frame a population defined and
avalible through records
Population: all possible individuals making up a group of interest in a study
Sample: a relatively small number of individuals drawn from a population for
inclusion in a study. Unreprestivive – sampling error
Subpopulation- small segment of the defined population
Generalization- the ability to apply findings from a sample to a larger population
Nonrandom sampling- not randomly chosen
- convience - snowballing
convience – readily available or who will volunteer to participate
snowballing- you select initial particiapnts who meet some kind of criteria and hen
aquire other participants through a referral from your initial participants
quota- convience sample that is comprised of subgroups similar in number to the
population. Specific people are being targeted because they match a criteria
pros and cons
pros- sampling from small subject pool is easier, requires less time, money and
cons: nonrandom samples have less external validity, and thus the generality of
results is compromised
nonrandom because they are not representative of the population the characterisics
of knowing how to use a computer and internet limits
advantage: broader range of participants
disadvantage: generalization , external validity
- solicit , flyers etc
- - subject pool
issues with subject pool
punished if they don’t participate, skew the data
- take lab to particiapts (knock on doors, churches etc)
- set of situation and wait for participants
- volunteer bias – bias in a sample that results from using volunteer participants
- volunteer bias can affect internal validity
- could hurt relationship bettwen IV and DV
- only volunteers can not be generalized to population so hurts external validity
fix this by
- avoid people to not volunteer anymore
- contributing to science
- making it sound appealing. Non threatening
- avoid tasks that are stressful - state that there is importance in this research and theyd be helping science.
An example(s) of poor research in Psychology, presented in the video on Research
Methods was/were __________.
A. None of the choices are correct.
B. Dr. Bettelheim’s study of child autism
C. intensive behavioral treatments of autism
D. both B and C.
Answer Key: B
Check to review before finishing (will be flagged in Table of Contents)
Question 2 of 50 Score: 0 (of possib1epoint)
Which of the following is NOT a purpose of conducting a literature search?
A. Prevent you from carrying out a study that has already been done
B. Identify questions that need to be answered
C. Provide ideas and justification for designing a study
D. none of the above
Answer Key: D
Question 3 of 50 Score: 1 (of possib1epoint) ________ theories are considered the best type of theory because they propose a new structure
to explain a phenomenon
Answer Key: D
Question 4 of 50 Score: 1 (of possibl1 point)
The information processing theory of memory (3 types of memory: sensory, short-term, and
long-term) is a descriptive theory because it
A. describes the concept and explains how it affects memory ability.
B. uses an analogy to describe a concept.
C. only describes features of the concept without explaining it.
D. creates a new concept to explain the relationship between two variables.
Answer Key: C
Question 5 of 50 Score: 1 (of possibl1 point)
According to Michael Shermer's presentation of Why people believe weird things, the
problem with having a theory is that _________________________________.
A. they tend to contain individual's own personal biases
B. they can never be falsified
C. they have to be tested D. they are not easy to develop
Answer Key: A
Check to review before finishing (will be flagged in Table o