HTHSCI 2S03 Lecture Notes - Lecture 8: Contingency Table, Analysis Of Variance, Parametric Statistics
2S03: Session 12
Nonparametric Tests
Non-parametric tests
• Most of the tests that we have learned so far are parametric tests
parametrics: objective is to estimate parameters or to test a hypothesis about one or more parameters
(e.g., population mean µ)
• In a parametric test we need to know the functional form of the population of interest (e.g.,
population data are normally-distributed)
In this session we introduce the methods that:
• are not concerned with population parameters; or
• do not depend on knowledge of the sampled population
o Do’t eed to ko the for of the distriutio skeed ad or oral data a e
used)
o Tests will deal with ranks, not raw data
Some common non-parametric tests:
• Chi-square test
• Mann-Whitney U test (also called Wilcoxon Rank Sum test)
• Wilcoxon Signed-Rank test
• Kruskal-Wallis test
• Spearman rank correlation
Non- parametric tests
Advantages
• Can be used if distribution of sampled population is unknown
• Can be used if known population distribution violates assumptions of other tests (e.g., non-
normal/ skewed)
• Can be applied when the data consist merely of rankings or classifications does’t hae to e
continuous)
Disadvantages
• If a dataset can be analyzed with parametric methods use of nonparametric methods often
result in wasted information
• Rankings in nonparametric are not sensitive to distances between values, you can loose
information if only dealing with ranks
find more resources at oneclass.com
find more resources at oneclass.com
Chi-square (x^2) test
• Iluded i oparaetri tests ee though assues a uderlig distriutio Χ^2
distribution)
• used for determining statistically significant association between 2 categorical variables
• used when both variables nominal (or ordinal with limited set of categories (if it is more
than 10 sets, you can consider ordinal as interval)
• chi-square has no negative values, as degrees of freedom become larger the shape pf the
distribution becomes more like the normal distribution
Chi- square test Contingency table
Start by putting data in Contingency Table
• 2x2 table: 2 categories for each of the 2 variables
• Chi-square can also be used when variables have more than 2 categories
• Dependent- outcome present
• Only using counts and frequency
Question: Is there a statistically significant association between the two variables?
find more resources at oneclass.com
find more resources at oneclass.com
Received HbA1c Testing in
Last 3 months
Ethnicity:
Inuit
First
Nations
YES
NO
Total
YES
128
(a)
70
(b)
198
(a+b)
NO
430
(c)
146
(d)
576
(c+d)
Total
558
(a+c)
216
(b+d)
774
(a+b+c+d)
Step 1: State Hypotheses
• H0: there is no association between the 2 variables (testing & ethnicity)
• HA: there is an association between the 2 variables
Step 2: Test Statistic
uses two sources of variation:
• Observed frequencies of outcome in sample data
• Expected frequencies of outcome assuming H0 is true
Test statistic:
Step 3: Decision Rule:
• if Χ test statisti > Χ ritial alue the Rejet H
• Χ ritial alue oes fro Χ ritial alue tale
o Need df to otai Χ ritial alue:
o df = (# of rows – 1) x (# columns – 1) = 1
▪ (2-1)x(2-1)=1
o if X2observed > 3.84, Reject H0
Step 4: Calculate Test Statistic
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Non-parametric tests: most of the tests that we have learned so far are parametric tests parametrics: objective is to estimate parameters or to test a hypothesis about one or more parameters (e. g. , population mean ) In a parametric test we need to know the functional form of the population of interest (e. g. , population data are normally-distributed) Some common non-parametric tests: chi-square test, mann-whitney u test (also called wilcoxon rank sum test, wilcoxon signed-rank test, kruskal-wallis test, spearman rank correlation. If a dataset can be analyzed with parametric methods use of nonparametric methods often result in wasted information: rankings in nonparametric are not sensitive to distances between values, you can loose information if only dealing with ranks. Start by putting data in contingency table: 2x2 table: 2 categories for each of the 2 variables, chi-square can also be used when variables have more than 2 categories, dependent- outcome present, only using counts and frequency.