Textbook Notes
(362,882)

Canada
(158,081)

York University
(12,350)

Psychology
(3,541)

PSYC 2030
(144)

Krista Phillips
(15)

Chapter 12

Unlock Document

York University

Psychology

PSYC 2030

Krista Phillips

Fall

Description

Study Guide: Chapter 12 – Understanding P values and Effect Size Indicators
Why Is It Important to Focus Not Just On Statistical Significance?
Besides describing data and measuring relationships many behavioral researchers are usually
interested in making comparisons using statistical tests such as t, F, and chi-square
Null hypothesis significance testing (NHST) -> traditional procedure
Much has been written about common misconceptions regarding statistical significance, it
remains a source of confusion for many people
J.D. Miller noted that many meical specialists mistakenly view “statistical significance” as a proxy
for the “degree of improvement a new treatment must make for it to be clinically meaningful”
Statistical significance tells us nothing about the degree of improvement a new treatment must
make for it to be clinically meaningful, but effect size indivators, properly interpreted, can often
give us insights about the practical significance of an obtained effect
On the other hand, misconceptions and illusions abound about the implications of certain effect
size indicators and can result in people drawing unwarranted conclusions
Box 12.1 Three Families of Effect Sizes
We call the three families of effect sizes a) the correlation (or r-type) family, b) the difference
family and c) ratio family
The emphasis in this chapter is on r-type effect size indicators, such as the point-biserial r and
the phi coefficient
Other member of difference family called risk difference
Also explained later in this chapter is the odds ratio and relative risk (both belong to the ratio
family of effect size indicators)
Advantage of r-type effect size indicators is their usefulness when predictions specifically involve
more than two groups or more than two conditions
Difference-type and ratio-type effect size indicators are not so naturally applicable in such
situations
Common thread throughout rest of text is the general relationship between the p value and the
effect size, as given by the following conceptual equation :
o Significance test = size f effect X size of study
The general relationship means that any test of statistical significance (such as t, F, or chi square)
can be shown to consist of two components a) an indicator of the size of the effect and b) an
indicator of the sampling units (e.g., the total N)
The conceptual equation shows that the value of the significance test is the product of these
two components
o Thus, unless the size of the obtained effet is exactly 0 (which is quite rare), the larger the
size of effect component or the larget the size of study component (e.g., the larger the
N), the larger is the value of the signifianve test and, therefore, the smaller (and usually
more coveted) is the p value
o In other words, focusing our attention only on statistical significance (the p value) would
not tellus whether the effect size, the total N or both were priariy rersponsible for the
level of statistical significance reached o Furthermore, even if the effect size component of the significance test were mainly
responsible, a question would still linger concerning which aspect of the effect size
indicator was the primary contributing factor
o Another common mistake is to equate “nonsignificance” (frequently defined as p > .05)
as equivalent to estimating an effect size equal to zero
o Later in this chapter, we will describe a useful statistic (the counternull statistic) that can
eliminate this error
o We will also discuss concept of statistical power and illustrate how a power analysis is
done
What Is the Reasoning Behind Null Hypothesis Significance Testing?
Analogy: imagine you’re strolling along Atlantic City Boardwalk when a shady character
approaches and whispers he has a quarter he is willing to sell you for “only five dollars”
o You ask what about this coin makes it worth more than its face value?
o He answers, this is a quarter with a special property, when properly used it could win an
enterprising person a lot of money because it does not always come up heads and tails
with equal regularity
o Intstead, one outcome is far more likely than the other, and a person with a touch of
larceny in his soul could bet on the outcome and win a tidy sum
o Because you haven’t walked away yet he adds – it might sound like a cock-and-bull
story, but flip the coin and see for yourself
If the coin is not what it is said to be, meaning it’s ordinary, then the probability of heads or tails
is always one chance in two
So you accept the challenge and test the probability of tails
You flip nine times and each time it comes up heads
Would you believe him now?
If your answer is yes, would you if you flipped nine times and once it came up heads, 8 tails?
This is the essential question in NHST
You can be as stringent as you like in setting a rejection criterion but you may eventually pay for
this decision by rejecting what you perhaps should not
When you decided to test whether the probability of heads “does or does not” equal the
probability of tails, two hypothesis were implied
One was that the quarter is unbiased (the probability of heads does equal the probability of
tails); the second implicit hypothesis was that the coin is biased (the probability of heads does
not equal the probability of tails)
You can think of the experiment of tossing a coin as a way of trying to determine which of these
hypotheses you cannot logically reject
In statistical terms, the name for the first hypothesis (that the quarter is unbiased) is the null
hypothesis (H ) and the second, that the quarter is biased is the alternative hypothesis2)H
Null hypothesis : the probability of heads equals the probability of tails in the long run because it
is an ordinary quarter, and therefore getting a head or a tail is the result purely of chance (i.e.
the coin is not biased)
Alternative hypothesis: the probability of heads is not equal to the probability of tails in the long
run because it is not an ordinary quarter (i.e., the coin is biased)
Notice that these two hypotheses are mutually exclusive, that is, when one hypothesis is true,
the other must be false Experimenters who do NHST are usually interested in testing the specific H0 (i.e., no difference)
against a general H1 (i.e., some difference)
In a between-subject design with an experimental and a control group, the null hypothesis
generally implies no difference in the success rate between experiment group and the contrl
group (e.e, no difference in survival rates, performance rates, or however else “success rates”
may be defined)
The idea behind NHST is to see whether we can reject H0 and yet be reasonably sure that we
will not be wrong in doing so
This leads to further ida that there are two kinds of decision risks of general concern in NHST,
called Type I error and Type II error
Box 12.2 How Are Probabilities Determined?
One characteristic of probabilities is that if all outcomes are independent (i.e., one outcome is
not influenced by any other), the sum of all the probabilities associated with an even is equal to
1
If you throw an ordinary six-sided die, there are six possibilities and unless the die isloaded, the
probability of any particular outcome is 1/6 or .167
Summing all the independent probabilities gives us .167 X 6 = 1.00
Instead of throwing a die, suppose you have two coins and fli both at the same time
There are four possible combinations of heads and tails: HH,HT,TH, TT
In determining probabilities, the general rule is to count the total number of possible outcomes
and then to count the number of outcomes and then to count the number of outcomes yielding
the event you are interested in
The probability of that event is the ratio of the number you are looking for (the favorable event)
to the total number of outcomes
Ex probability of p of two heads (out of the four possible outcomes) can occur in only one way
(HH) and is therefore 1 divided by 4, so p = .25
The probability of only one head out of these four possible events can occur in two ways, ht or
TH and is therefore 2 divided by four, so p = .5
What Is the Distinction Between Type I Error and Type II Error?
Type I Error -> implies that the decision maker mistakenly rejected the null hypothesis when it is
in fact true and should not have been rejected
Type II Error -> implies the decision maker mistakenly failed to reject the null hypothesis when it
is, in fact false and should have been rejected
the risk (or probability) of making Type I error is called by three different names: alpha (a), the
significance level, and the p value
the risk or probability of making a Type II error is known by one name: beta (B)
to make the most informed decision, researcher who do NHST would, of course, like to know
what each type of risk is in a given case so that they can balance these risks in some way
5% significance level = you do not want to be wrong more than 1/20 times
If probability is less than 1/20, p .05, will fail to reject null
Analogy is simplified and not exactly what goes on in NHST, it’s not a relational event – there is
only one variable: the result of the coin toss Researcher who deos NHST usually wants to know probability of claiming that two variables (X
and Y) are related when, in fact, they are unrelated, or that the average “success rate” of one
group (e.g., the experimental group) hasa surpassed that of another group (the control group)
In practical terms, then, Type I error can be understood as mistakenly claiming a relationship
that does not truly exist; it is the likelihood of this risk that initially mot interest researchers who
rely on NHST
They want to know what the probability of Type I error is
Most researcher who do NHST are not indifferent to the probability of making a Type II error,
many of them do tend to attach greater psychological importanc to the risk of making Type I
error thn to the risk of making a Type II error
Of course, in daily life, people also give greater weight to some decision risks than to others
But the reason the researcher attaches greater weight to risk of Type I error is because it would
be making an error of gullibility or being fleeced by the huckster’s claim in that an ordinary coin
is biased
A type II error implies blindness or failure to perceive that a notso-ordinary coin is reall biased as
claimed
Though this analogy is a long stretch, the fact is that scientific researchers have been
traditionally taught that it is far worse to risk being gullible than to risk being blind to a real or
true relationship
Some philosophers have characterized this choice as the healthy skepticism of the scientific
method
Null hypothesis = assumption that no relationship between two variables is present in the
population from which the sample was drawn, or that there is no difference in “success rates” in
the different groups or conitoins
The researcher considers the possibility of making a type I error whenever a true null hypothesis
is tested
In every day life we weight type I error more heavily: trial for murderer, more dangerous to let a
guilty person go free than to send an innocent person to jail
What Are One-Tailed and Two-Tailed p Values?
Two-tailed p value is applicable when the alternative hypothesis did not specifically predict in
which side or tail of the probability distribution the significance would be deteched
The one tailed p value – is applicable when the alternative hypotheses requires the significance
to be in one tail rather than in the other
As you search journals for background information for your research proposal, you will find that
many researchers ignore the one-tail versus two-tail distinction and report only two-tailed p
values, a conservative convention that is also acceptable in most cases
If in doubt, use two tailed p value – safer
Not all instructors insist on the same reporting conventions, but if you’re expected to report the
actual descriptive level of statistical significance: reporting this c

More
Less
Related notes for PSYC 2030