School

Kent State UniversityDepartment

Business Administration InterdisciplinaryCourse Code

BUS 10123Professor

Eric Von HendrixLecture

38This

**preview**shows pages 1-2. to view the full**7 pages of the document.**Chapter 11

Panel Data

The Nature of Panel Data

Panel data, also known as longitudinal data, have both time series and cross-sectional

dimensions.

They arise when we measure the same collection of people or objects over a period of time.

Econometrically, the setup is

where yit is the

dependent variable,

is the intercept term,

is a k 1 vector of parameters to be estimated

on the explanatory variables, xit; t = 1, …, T;

i = 1, …, N.

The simplest way to deal with this data would be to estimate a single, pooled regression on all

the observations together.

But pooling the data assumes that there is no heterogeneity – i.e. the same relationship holds

for all the data.

The Advantages of using Panel Data

There are a number of advantages from using a full panel technique when a panel of data is

available.

We can address a broader range of issues and tackle more complex problems with panel data

than would be possible with pure time series or pure cross-sectional data alone.

It is often of interest to examine how variables, or the relationships between them, change

dynamically (over time).

By structuring the model in an appropriate way, we can remove the impact of certain forms of

omitted variables bias in regression results.

Seemingly Unrelated Regression (SUR)

One approach to making more full use of the structure of the data would be to use the SUR

framework initially proposed by Zellner (1962). This has been used widely in finance where the

requirement is to model several closely related variables over time.

A SUR is so-called because the dependent variables may seem unrelated across the equations

at first sight, but a more careful consideration would allow us to conclude that they are in fact

related after all.

Under the SUR approach, one would allow for the contemporaneous relationships between the

error terms in the equations by using a generalised least squares (GLS) technique.

The idea behind SUR is essentially to transform the model so that the error terms become

uncorrelated. If the correlations between the error terms in the individual equations had been

zero in the first place, then SUR on the system of equations would have been equivalent to

running separate OLS regressions on each equation.

Fixed and Random Effects Panel Estimators

The applicability of the SUR technique is limited because it can only be employed when the

number of time series observations per cross-sectional unit is at least as large as the total

number of such units, N.

A second problem with SUR is that the number of parameters to be estimated in total is very

large, and the variance-covariance matrix of the errors also has to be estimated. For these

reasons, the more flexible full panel data approach is much more commonly used.

Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

There are two main classes of panel techniques: the fixed effects estimator and the random

effects estimator.

Fixed Effects Models

The fixed effects model for some variable yit may be written

We can think of

i as encapsulating all of the variables that affect yit cross-sectionally but do not

vary over time – for example, the sector that a firm operates in, a person's gender, or the

country where a bank has its headquarters, etc. Thus we would capture the heterogeneity that

is encapsulated in

i by a method that allows for different intercepts for each cross sectional

unit.

This model could be estimated using dummy variables, which would be termed the least

squares dummy variable (LSDV) approach.

Fixed Effects Models (Cont’d)

The LSDV model may be written

where D1i is a

dummy variable that takes the value 1 for all observations on the first entity (e.g., the first firm)

in the sample and zero otherwise, D2i is a dummy variable that takes the value 1 for all

observations on the second entity (e.g., the second firm) and zero otherwise, and so on.

The LSDV can be seen as just a standard regression model and therefore it can be estimated

using OLS.

Now the model given by the equation above has N+k parameters to estimate. In order to avoid

the necessity to estimate so many dummy variable parameters, a transformation, known as the

within transformation, is used to simplify matters.

The Within Transformation

The within transformation involves subtracting the time-mean of each entity away from the

values of the variable.

So define as the time-mean of the observations for cross-sectional unit i, and similarly

calculate the means of all of the explanatory variables.

Then we can subtract the time-means from each variable to obtain a regression containing

demeaned variables only.

Note that such a regression does not require an intercept term since now the dependent

variable will have zero mean by construction.

The model containing the demeaned variables is

We could write this as

where the double

dots above the variables denote the demeaned values.

This model can be estimated using OLS, but we need to make a degrees of freedom correction.

The Between Estimator

An alternative to this demeaning would be to simply run a cross-sectional regression on the

time-averaged values of the variables, which is known as the between estimator.

An advantage of running the regression on average values (the between estimator) over

running it on the demeaned values (the within estimator) is that the process of averaging is

likely to reduce the effect of measurement error in the variables on the estimation process.

###### You're Reading a Preview

Unlock to view full version