18 Apr 2012

School

Department

Course

Professor

1 Two other important inference problems are: hypothesis testing and the prediction of random

variables.

Chapter 3

Statistical Estimation of The Regression Function

3.1 Statistical Estimation

If the population can be observed, there is no statistical problem: all the features if the population

are known. The problem of statistical inference arises when the available information consists of a

limited sample that is randomly drawn from the (possibly infinitely large) population and we want to

infer something about the population using the sample at hand. Statistical estimation is one aspect of

statistical inference1 - it concerns the estimation of population parameters such as the population mean

and variance and the coefficients of a linear regression function.

In Chapter 2 the population regression function of Y given X is defined as the conditional mean

function of Y given X, written as E(Y | X). An important reason for our interest in this functional

relationship is that it allows us predict Y given values of X and to quantify the effect of a change in X on

Y (measured by say the derivative with respect to X.) Moreover, the conditional predictions of Y that are

produced by the regression function are optimal under specific but generally applicable conditions. This

chapter is concerned with the problem of estimating the population regression function using a sample

drawn from the population.

3.1.1 Parametric Versus Nonparametric Methods

Figure 2.7 and the related discussion illustrates how a sample can be used to estimate a

population regression. Since the population regression function of Y given X is the conditional mean of

Y given X, we simply computed a sequence of conditional means using the sample and plotted them.

Nothing in the procedure constrains the shape of the estimated regression. Indeed, the empirical

regression of Size given Price (the plot of in Figure 2.7) wanders about quite irregularly (although

as it does so it retains a key feature that we expect of the population regression of S given P, namely that

its average slope is steeper than the major axis - the empirical regression starts off below the major axis

and then climbs above it.) The method used to estimate the empirical regression functions in Figure 2.7

Econometrics Text by D M Prescott © Chapter 3, 2

2 The graph of Y = a + bX + cX2 is symmetric about the line X = - b/(2c)

3 The meaning of “performing well” will be discussed later in the chapter.

can be described as nonparametric. While there is a huge literature on nonparametric estimation, this

book is concerned almost entirely with parametric models.

To illustrate the distinction between parametric and nonparametric methods, consider the

equation Y = a + bX. This equation has two parameters (or coefficients): a and b and clearly the

relationship between Y and X is linear. By varying the values of a and b the line’s height and slope can

be changed, but the fundamental relationship is constrained to be linear. If a quadratic term (and one

more parameter) is added: Y = a + bX + cX2 , the relationship between Y and X becomes more flexible

than the linear function. Indeed, the quadratic form embraces the linear form as a special case (set c = 0).

But the linear form does not embrace the quadratic form: no values of a and b can make the linear

equation quadratic. Of course, the three parameter quadratic equation is also constrained. A quadratic

function can have a single maximum or a single minimum but not both. Quadratic functions are also

symmetric about some axis2. If further powers of X are added, each with its own parameter, the

relationship becomes increasingly flexible in terms of the shape it can take. But as long as the number of

parameters remains finite, the shape remains constrained to some degree. The nonparametric case is

paradoxically not the one with zero parameters but the limiting case as the number of parameters

increases without bound. As the number of terms in the polynomial tends to infinity, the functional

relationship becomes unconstrained - it can take any shape. As noted above, the method used to

construct the empirical regressions in Figure 2.7 did not constrain the shape to be linear, quadratic or any

other specific functional relationship. In that sense the method used in Chapter 2 to estimate the

population regression can be called nonparametric.

In the context of regression estimation, the great appeal of nonparametric methods is that they do

not impose a predetermined shape on the regression function - which seems like a good idea in the

absence of any information as to the shape of the population regression. However, there is a cost

associated with this flexibility and that concerns the sample size. To perform well3, the nonparametric

estimator generally requires a large sample (the empirical regressions in Figure 2.7 used a sample of

almost 5,000 observations). In contrast, parametric methods that estimate a limited number of parameters

can be applied when samples are relatively small. The following examples by-pass the statistical aspect

Econometrics Text by D M Prescott © Chapter 3, 3

of the argument but nevertheless provide some intuition. If you know that Y is a linear function of X,

then two points (2 observations) are sufficient to locate the line (and to determine the two parameters.)

If you know the relationship is quadratic, just three points are sufficient to plot the unique quadratic

function that connects the three points and therefore three observations will identify the three parameters

of the quadratic equation. The relationship continues: in general n points will determine the n parameters

of an nth order polynomial.

3.2 Principles of Estimation

As discussed in Chapter 2, there are examples of bivariate distributions in which the population

regression functions are known to be linear. In the remainder of this chapter we will be concerned with

linear population regressions and the methods that can be used to estimate them. We begin with a

discussion of alternative approaches to statistical estimation - all of which are parametric.

3.2.1 The Method of Moments

The quantities

are referred to as the first, second and third uncentred moments of the random variable X. The centred

moments are measured around the mean

The Method of Moments approach to estimating these quantities is to simply calculate their sample

equivalents, all of which take the form of averages. Table 3.1 provides the details for the first two

moments. Notice the parallels between the expressions for the population moments and their sample

counterparts. First, the estimator uses instead of the expectation operator E. Both “take an

average”, one in the sample, the other in the population. Second, the estimator is a function of the

observations Xi whereas the population moment is defined in terms of the random variable X.