1 Two other important inference problems are: hypothesis testing and the prediction of random
Statistical Estimation of The Regression Function
3.1 Statistical Estimation
If the population can be observed, there is no statistical problem: all the features if the population
are known. The problem of statistical inference arises when the available information consists of a
limited sample that is randomly drawn from the (possibly infinitely large) population and we want to
infer something about the population using the sample at hand. Statistical estimation is one aspect of
statistical inference1 - it concerns the estimation of population parameters such as the population mean
and variance and the coefficients of a linear regression function.
In Chapter 2 the population regression function of Y given X is defined as the conditional mean
function of Y given X, written as E(Y | X). An important reason for our interest in this functional
relationship is that it allows us predict Y given values of X and to quantify the effect of a change in X on
Y (measured by say the derivative with respect to X.) Moreover, the conditional predictions of Y that are
produced by the regression function are optimal under specific but generally applicable conditions. This
chapter is concerned with the problem of estimating the population regression function using a sample
drawn from the population.
3.1.1 Parametric Versus Nonparametric Methods
Figure 2.7 and the related discussion illustrates how a sample can be used to estimate a
population regression. Since the population regression function of Y given X is the conditional mean of
Y given X, we simply computed a sequence of conditional means using the sample and plotted them.
Nothing in the procedure constrains the shape of the estimated regression. Indeed, the empirical
regression of Size given Price (the plot of in Figure 2.7) wanders about quite irregularly (although
as it does so it retains a key feature that we expect of the population regression of S given P, namely that
its average slope is steeper than the major axis - the empirical regression starts off below the major axis
and then climbs above it.) The method used to estimate the empirical regression functions in Figure 2.7
Econometrics Text by D M Prescott © Chapter 3, 2
2 The graph of Y = a + bX + cX2 is symmetric about the line X = - b/(2c)
3 The meaning of “performing well” will be discussed later in the chapter.
can be described as nonparametric. While there is a huge literature on nonparametric estimation, this
book is concerned almost entirely with parametric models.
To illustrate the distinction between parametric and nonparametric methods, consider the
equation Y = a + bX. This equation has two parameters (or coefficients): a and b and clearly the
relationship between Y and X is linear. By varying the values of a and b the line’s height and slope can
be changed, but the fundamental relationship is constrained to be linear. If a quadratic term (and one
more parameter) is added: Y = a + bX + cX2 , the relationship between Y and X becomes more flexible
than the linear function. Indeed, the quadratic form embraces the linear form as a special case (set c = 0).
But the linear form does not embrace the quadratic form: no values of a and b can make the linear
equation quadratic. Of course, the three parameter quadratic equation is also constrained. A quadratic
function can have a single maximum or a single minimum but not both. Quadratic functions are also
symmetric about some axis2. If further powers of X are added, each with its own parameter, the
relationship becomes increasingly flexible in terms of the shape it can take. But as long as the number of
parameters remains finite, the shape remains constrained to some degree. The nonparametric case is
paradoxically not the one with zero parameters but the limiting case as the number of parameters
increases without bound. As the number of terms in the polynomial tends to infinity, the functional
relationship becomes unconstrained - it can take any shape. As noted above, the method used to
construct the empirical regressions in Figure 2.7 did not constrain the shape to be linear, quadratic or any
other specific functional relationship. In that sense the method used in Chapter 2 to estimate the
population regression can be called nonparametric.
In the context of regression estimation, the great appeal of nonparametric methods is that they do
not impose a predetermined shape on the regression function - which seems like a good idea in the
absence of any information as to the shape of the population regression. However, there is a cost
associated with this flexibility and that concerns the sample size. To perform well3, the nonparametric
estimator generally requires a large sample (the empirical regressions in Figure 2.7 used a sample of
almost 5,000 observations). In contrast, parametric methods that estimate a limited number of parameters
can be applied when samples are relatively small. The following examples by-pass the statistical aspect
Econometrics Text by D M Prescott © Chapter 3, 3
of the argument but nevertheless provide some intuition. If you know that Y is a linear function of X,
then two points (2 observations) are sufficient to locate the line (and to determine the two parameters.)
If you know the relationship is quadratic, just three points are sufficient to plot the unique quadratic
function that connects the three points and therefore three observations will identify the three parameters
of the quadratic equation. The relationship continues: in general n points will determine the n parameters
of an nth order polynomial.
3.2 Principles of Estimation
As discussed in Chapter 2, there are examples of bivariate distributions in which the population
regression functions are known to be linear. In the remainder of this chapter we will be concerned with
linear population regressions and the methods that can be used to estimate them. We begin with a
discussion of alternative approaches to statistical estimation - all of which are parametric.
3.2.1 The Method of Moments
are referred to as the first, second and third uncentred moments of the random variable X. The centred
moments are measured around the mean
The Method of Moments approach to estimating these quantities is to simply calculate their sample
equivalents, all of which take the form of averages. Table 3.1 provides the details for the first two
moments. Notice the parallels between the expressions for the population moments and their sample
counterparts. First, the estimator uses instead of the expectation operator E. Both “take an
average”, one in the sample, the other in the population. Second, the estimator is a function of the
observations Xi whereas the population moment is defined in terms of the random variable X.