Ec 120B – ECONOMETRICS B LECTURE NOTES
Foster, UCSD April 10, 2014
TOPIC 17. NONPARAMETRIC STATISTICS
A. Classical vs. Nonparametric Statistics
1. Parametric or “Classical” Techniques:
a) In general, there are three broad kinds of statistical paradigms or philosophies used for statistical
inference (estimation/hypothesis testing/forecasting).
1) Classical (parametric) methods taught in Ec 120ABC.
2) Nonparametric methods introduced here.
3) Bayesian statistics not covered in Ec 120.
b) Rules of variable measurement.
1) Categorical or nominal variables indicate whether an observation is in, or not in, one of a set
of mutually exclusive categories. For example, H = 1 if patient is healthy, H = 2 if moderately
ill, H = 3 if seriously ill. Order and magnitude do not matter.
2) Ordinal scale variables numerical values where order matters but not differences in values.
For example, rate a movie on a scale from R = 1 to 5. R = 5 is better than R = 4, but we can’t
say it is 20% better.
3) Interval scale variables numerical values where order and differences in values matter. For
example, T = temperature. T = 90▯ F is 10▯ warmer than T = 80▯.
4) Ratio scale variables interval scale variables where there is also meaningful 0 and therefore
the ratio of values matters. For example, a weight of W = 200 pounds is twice as heavy as W =
100 pounds, and W = 0 is weightless.
5) Parametric methods require that our sample data be measurable numerically, on an interval or
ratio scale: tons, $, number of successes in n trials, degrees Centigrade, etc. (The exception is
use of dummy or categorical variables in regression analysis.)
c) Classical (aka “parametric”) methods revolve around estimating parameters of a population
distribution f(X) of one or more random variables: m , s , s(x, y), r(x, y), etc.
d) As you know, our para2etric methods frequently assume that the underlying population random
variable X ~ N(m , s ), or else that the sample size n is large enough that we are dealing with
approximately normal distributions.
1) The sample mean‘X is normally distributed if X is normal in the population, or if the sample
size n is large enough (the CLT).
2) The sample variance s is related to a chisquare distribution only if X is normal in the
3) Confidence intervals and hypothesis tests using the standard normal (z) or studentt distributions
require that X be approximately normal in the population or that the sample size n be very large.
4) Almost every c or F test assumes the normality of either population X or of some estimator of a
parameter of f(X).
e) Robustness of classical parameter estimators.
1) We like estimators that are unbiased (at least for large sample n), consistent (more accurate as n
increases), and efficient (make optimum use of all information contained in the sample). Ec 120C NON PARAMETRIC STATISTICS Page 2 of
2) We would also like estimators that are robust, which means that they are accurate and valid even
when X is NOT normally distributed in the population.
3) ‘X as an estimator, and t tests and OLS estimates β j , are robust for large n.
4) s as an estimator, and c and F tests, are not robust.
2. Nonparametric or “DistributionFree” Statistics:
a) “Nonparametric” means that these methods do not depend on the distribution or the parameters of
some underlying population2random variable. They are “distribution free.” In particular, they do
not depend on X ~N(m , s ), or on sample n being large.
b) Nonparametric methods do not require data measures as interval or ratio scale variables.
1) Some methods apply to data recorded numerically but only on an ordinal scale, such as rank
orderings of preferences .
2) Other methods apply to data recorded only nominally, such as frequency counts of observations
falling into one of several categories.
c) The applicabilityprecision tradeoff.
1) Parametric methods have narrower applicability to real world issues because they rely on special
assumptions (normality of X) or large sample sizes (n). But if the assumptions are valid, they
give fairly precise results for estimates and hypothesis tests.
2) Nonparametric methods can be applied even if n is small or if nothing is known about
population distributions of X. But they are less precise because they are less efficient, in the
sense that they may not make use of ALL the information in a given sample of observations.
d) Plan of attack.
1) We will look at nonparametric descriptive statistics first.
2) Then we look at chisquare tests of goodness of fit and independence. (These might be called
3) Finally, we shall explore several kinds of nonparametric hypothesis tests. Ec 120C NON PARAMETRIC STATISTICS Page 3 of
B. Nonparametric Descriptive Statistics
1. Sample Median X :
a) Population and sample median.
1) The true population median n of the distribution f(x) of a random variable X is the middle value
(= 50 percentile). Pr (X ≤ n) = Pr (X ≥ n) = 0.50.
2) For a group of sample observations x , i = 1…n, sample median X is the middle value. Half
the observations are less than or equal to the median, and half are greater than or equal.
3) We can think of X as an estimator of n, or as a useful descriptive statistic.
b) Computing sample median X :
1) First, order the sample data from smallest to largest: x = smallest, x = largest.
̃ (1) (n)
2) If n is odd, X=middlevalue=x n+1 .
3) If n is even, X=averageof twomiddlevalues= (2 x nx n+1
[ (2 ( )2 ]
c) Advantage of median over the mean.
1) The data in Table 1 and Figure 1 have been sorted from smallest to largest. The values are the
same except for (10) 54, but (10) 19
(i) x(i) y(i)
1 3 3
2 5 5
3 6 6
4 9 9
5 11 11
6 12 12
7 12 12
8 15 15
9 16 16
10 19 54
X 11.5 11.5
X 10.8 14.3
IQ 9 9
s 5.12 14.6
S –0.01 1.92
K2 1.60 5.57
χJB 0.82 8.91
3 5 6 9 11 12 15 16 19
3 5 6 9 11 12 15 16 54
Fig. 1 Ec 120C NON PARAMETRIC STATISTICS Page 4 of
2) The median = 11.5 for both X and Y, but the mean is pulled up by the one large and atypical
value of y (10)f these were home prices or family incomes for a small community, the median
gives a much clearer picture of the economic status, so we usually hear about median incomes
and home values, not averages.
2. Sample Percentiles (P ) and the Interquartile Range (P – P ):
k 75 25
a) Percentiles are measures of relative standing. If you took the SATs and scored in the 93 percentile
(P ), then you scored higher than 93% of the people who took the test, and 7% scored higher than
b) Computing P . k
1) Order data from smallest to largest: x = (1)llest, x = la(n)st.
2) Compute position index h = nk/100.
3) If h is an integer, P k= [x(h) (h+1) .
4) If h is not an integer, round up to h+, and P = k (h+)
5) Note that the sample median X = P 50
c) The interquartile range (IQR).
1) IQR = P –75 . 25e middle half of the observations is between P and P . 25his is 75useful
measure of dispersion of data.
2) In Table 1 and Figure 1, for both X and Y, P = x = 6 and P = x = 15, so IQR = 9.
25 (3) 75 (8)
3) If we measure dispersion by the sample standard deviation s, we see that the presence of one or
a few very extreme values like y is(10)leading as a measure of the degree of equality or
inequality in the values.
4) Boxandwhisker plots. [Explain]
3. Sample Skewness and Kurtosis:
a) Skewness (S).
1) Skewness is a measure of the asymmetry of a distribution about its mean m . The normal
distribution is symmetric, so its skewness = 0.
Neg. Skewed Ec 120C NON PARAMETRIC STATISTICS Page 5 of
1 ∑ (x −x)3
2) We estimate with sample skewness ̂ (n i .
3) Skew is toward longest tail.
• If S > 0, distribution is skewed to the right (positively skewed)
• If S χ (2) .
0 JB α
5) In Table 1, we clearly reject that Y is normally distributed, but accept that X might be.
2. Goodness of Fit the Multinomial Distribution:
a) The multinomial distribution.
1) In a binomial distribution, we assume every element of a population falls into one of 2
categories (arbitrarily labeled “success” and “failure”), with probabilities p and 1–p.
2) In a multinomial distribution, there are k categories, with probabilities p, j = 1…k.
b) Using a multinomial goodness of fit test an example.
1) All Michigan residents fall into one of four religious affiliations, with proportions p. j
Category pj ej= npj sample f j
1. Catholic 0.20 60(0.20) = 12 11 0.083
2. Protestant 0.30 18 17 0.056
3. Jew 0.10 6 4 0.667
4. Other/NP 0.40 4 28 0.667
k = 4 S = 1 S = n = 60 S = n = 60 S = 1.472
2) Many families left Michigan to look for work during the recession of 198082. We want to
know if some religious affiliations are more likely to emigrate than others. So we want to test
the following hypothesis concerning the population of emigrant families:
H 0 1 = 0.20, 2 = 0.30, p3= 0.10, p4= 0.40 H1: Not so
The null says that the proportions of the religious groups among emigrant families is the same as
in the population as a whole, so the decision to emigrate is not different for different religions;
religion and emigration are not related.
3) From a sample of n = 60 recent emigrant families, we obtain frequencies f in Tajle 2.
4) If H 0s true, we expect to obtain frequencies e =j np. j
5) For this kind of test, we need a sample n large enough that e ≥ 5jin every row.
6) Our hypothesis test statistic is 2 (fj−e j) .
χ MN = ∑ e
2 2 j
7) If H 0s true, then ̂χMN χ (k−1) . Ec 120C NON PARAMETRIC STATISTICS Page 7 of
16 2 2
8) We do not reject H 0ecause χ MN = 1.472