Stats After Midterm 1 and 2 Information
13.1 One-Sample Tests about the Median
- Based on the central limit theorem, that assumption is less important for large samples, but if
the distribution is very non-normal even a large sample of n=30 may not be large enough to
- Nonparametric hypothesis tests do not require such preconceptions about the distribution of a
o Also called Distribution-free tests.
- Some of the methods in this chapter rely on the concept of ranks, or rank-ordering of data.
Assigning ranks to Data
- The differences between the assigned numbers (or ranks) can vary among the people who fill in
the survey, or even for the same person.
- First sort your raw data into ascending or descending order.
A Hypothesis about the Median: The Sign Test
- The assumptions for validly using the sign test are:
o The raw data at least at the ordinal level.
o Observations are independent and are taken from a randomly drawn sample.
o Relatively few values exactly equal the median (This will be the case if the underlying
distribution is continuous or else the population is very large.)
13.2 Two-Sample Tests about the Median
Wilcoxon Rank-Sum Test: Two independent Samples
- Used to compare the medians of two independent samples to test for their equality.
- Equivalent to another test called the Mann-Whitney U Test in the sense that, given the same
data, it would give the same test result.
- General assumptions:
o The raw data are at least at the ordinal level.
o Each sample value is drawn randomly.
o The samples are independe.t o The two sample distributions are similar to one another (but no particular shape is
Wilcoxon Signed Ranks Test: Two Dependent Samples
- Provides an alternative if the assumptions for the t-test are not met sufficiently.
- Assumptions are:
o The pairs are randomly selected.
o The measurement level of the observations is at least at the ordinal level
o The population of differences (as calculated from all the paired values) is approximately
Chapter 14:.1 On-Way ANOVA
- With analysis of variance (ANOVA) techniques, we can test for a difference among more than
- If a difference in means is confirmed, ANOVA does not actually specify which of the means is
different from the others.
- One-way ANOVA( also called One-factor ANOVA) presume that in the data set there are two
o The dependent variable
o A grouping (or factor) variable.
- For test purposes, all numeric values associated with a single group are interpreted as a sample
from a separate population.
- Ho: u1 = u2 = u3 = …. = uk
- Two compelling resaons for not using this approach are:
o Depending on the number of groups, the number of pairs of means to test separately
would become quite large.
o If tests for mean-pairs are combined in this fashion, the risk of making a Type 1 error
grows significantly. For any one test:
P (there has not been a Type 1 error) = (1 – a)
Assumptions for the ANOVA Model
1) All of the samples must be random and must be drawn independently from one another.
2) Each sample should be drawn from a population with an approximately normal distribution. If
distributions depart only moderately from the normal, the test is reasonably robust.
3) The variances of all of the populations must be nearly equal.
4) All the data should be at the interval or ratio level of measurement. - The test used for ANOVA is the F-test which is used for comparing variances.
- The idea is to compare the variance between or among the different groups with the variance
within the group.
- Since the model assumes that all of the groups’ variances are equal, the variance among groups
should not differ from the variances within groups if all the population means are also equal.
15.1 Linear Correlation for Continuous Bivariate Data
- If P(A|B) is different for different values or ranges of B and/or P(B|A) is different for different
values or ranges of A, then there is an association between two variables A and B.
- Linear correlation test whether data are associated in the manner.
- If two continuous variables are shown to have an association of this type, a natural next step is
to build a model (called a regression model) that shows the nature of the association and
provides a tool for predicting or stimulating the value of one variable from the value of the
- When the association of two variables follow a linear pattern in a scatter diagram, we say there
is a linear correlation between two variables.
Strength and Sign of the Correlation
- A scatter diagram provides a useful first impression of whether two variables are correlated but
cannot provide an objective measure for the strength (if any) of the relationship.
- We calculate a sample coefficient of correlation [r] to generates a more objective measure of
the strength of linear correlation.
- A negative sign means that there is an inverse correlation between the values of the two
- In a positive correlation, if the value of one variable increases, then the corresponding value of
the other variable increases as well.
- There are no precise cutoffs for concluding that an r-value is strong, moderate, weak, or any
other degree of strength, although an approximate guide is provided which is roughly based on
the percentage of variance explained, for different values of r.
- In order to validly use the model presented in this section, the following conditions for the data
set must apply:
o The data must be quantitative and, ideally at least at the interval level of measurement.
o The set of paired data in a random sample.
o The population from which the sample is drawn has a bivariate normal distribution.
- The form of the denominator will look familiar, as it is based on variance for the individual x-
and y-values in the data pairs. - The numerator may look less familiar; but it is based on the convariance of how the two values
in each pair vary together.
- On the other hand, if there is no correlation, then the numerator tends toward zero: there will
be cases where one of the products being summed in the numerator is positive (ex. For x- and y-
values that are both larger than their respective means) but these cases will cancelled out, when
taking the sum, by cases where the product is negative.
Significance of the Correlation
- If paired variables meet the conditions for assessing linear correlation, then a value of r close to
plus or minus 1.0 can indicate a strong correlation, and the sign of r can indicate the direct or
inverse nature of the correlation.
- Significant – r is significant if the evidence suggests that the appearance of correlation in the
sample represents a true population parameter and not just a random sampling effect.
- In formal terms, the population parameter that corresponds to r is p (pronounced rho).
- R is significant if in a test of null hypothesis “p = 0” (i.e, there is no correlation”), Ho would be
15.2 Rank-Order Correlation for Ordinal Data
- the coefficient of correlation r that we calculated in the previous section is technically known as
the Pearson product-moment coefficient of correlation.
- If data are at the ordinal (i.e. rank order) scale, the necessary conditions for r are not satisfied.
- The nonparametric correlation is called the Spearman (or Spearman’s) rank correlation
coefficient, and is symbolized by rs.
- Ironically, despite its revised name and symbol, the calculations for rs are really no different
from those for r.
- When calculating Rs, the direction of ranking does not matter, s long as both of the variables ar
ranked on the same basis.
- Similar to the case for r, rs calculates to +1.0 if the paired variables’ rankings are perfectly and
- If there is no correlation whatsoever between the paired ranks, Rs calculates to 0.0
Cautions when Interpreting Rank Correlation Results
- Also, the spearman approach should be applied only if the relationship between the paired
ranks is roughly monotonic.
o If you steadily increase the value or rank for one variable, the other variable consistently