BUS 10123 Lecture 38: Panel Data
SchoolKent State University
DepartmentBusiness Administration Interdisciplinary
Course CodeBUS 10123
ProfessorEric Von Hendrix
This preview shows pages 1-2. to view the full 7 pages of the document.
The Nature of Panel Data
Panel data, also known as longitudinal data, have both time series and cross-sectional
They arise when we measure the same collection of people or objects over a period of time.
Econometrically, the setup is
where yit is the
is the intercept term,
is a k 1 vector of parameters to be estimated
on the explanatory variables, xit; t = 1, …, T;
i = 1, …, N.
The simplest way to deal with this data would be to estimate a single, pooled regression on all
the observations together.
But pooling the data assumes that there is no heterogeneity – i.e. the same relationship holds
for all the data.
The Advantages of using Panel Data
There are a number of advantages from using a full panel technique when a panel of data is
We can address a broader range of issues and tackle more complex problems with panel data
than would be possible with pure time series or pure cross-sectional data alone.
It is often of interest to examine how variables, or the relationships between them, change
dynamically (over time).
By structuring the model in an appropriate way, we can remove the impact of certain forms of
omitted variables bias in regression results.
Seemingly Unrelated Regression (SUR)
One approach to making more full use of the structure of the data would be to use the SUR
framework initially proposed by Zellner (1962). This has been used widely in finance where the
requirement is to model several closely related variables over time.
A SUR is so-called because the dependent variables may seem unrelated across the equations
at first sight, but a more careful consideration would allow us to conclude that they are in fact
related after all.
Under the SUR approach, one would allow for the contemporaneous relationships between the
error terms in the equations by using a generalised least squares (GLS) technique.
The idea behind SUR is essentially to transform the model so that the error terms become
uncorrelated. If the correlations between the error terms in the individual equations had been
zero in the first place, then SUR on the system of equations would have been equivalent to
running separate OLS regressions on each equation.
Fixed and Random Effects Panel Estimators
The applicability of the SUR technique is limited because it can only be employed when the
number of time series observations per cross-sectional unit is at least as large as the total
number of such units, N.
A second problem with SUR is that the number of parameters to be estimated in total is very
large, and the variance-covariance matrix of the errors also has to be estimated. For these
reasons, the more flexible full panel data approach is much more commonly used.
Only pages 1-2 are available for preview. Some parts have been intentionally blurred.
There are two main classes of panel techniques: the fixed effects estimator and the random
Fixed Effects Models
The fixed effects model for some variable yit may be written
We can think of
i as encapsulating all of the variables that affect yit cross-sectionally but do not
vary over time – for example, the sector that a firm operates in, a person's gender, or the
country where a bank has its headquarters, etc. Thus we would capture the heterogeneity that
is encapsulated in
i by a method that allows for different intercepts for each cross sectional
This model could be estimated using dummy variables, which would be termed the least
squares dummy variable (LSDV) approach.
Fixed Effects Models (Cont’d)
The LSDV model may be written
where D1i is a
dummy variable that takes the value 1 for all observations on the first entity (e.g., the first firm)
in the sample and zero otherwise, D2i is a dummy variable that takes the value 1 for all
observations on the second entity (e.g., the second firm) and zero otherwise, and so on.
The LSDV can be seen as just a standard regression model and therefore it can be estimated
Now the model given by the equation above has N+k parameters to estimate. In order to avoid
the necessity to estimate so many dummy variable parameters, a transformation, known as the
within transformation, is used to simplify matters.
The Within Transformation
The within transformation involves subtracting the time-mean of each entity away from the
values of the variable.
So define as the time-mean of the observations for cross-sectional unit i, and similarly
calculate the means of all of the explanatory variables.
Then we can subtract the time-means from each variable to obtain a regression containing
demeaned variables only.
Note that such a regression does not require an intercept term since now the dependent
variable will have zero mean by construction.
The model containing the demeaned variables is
We could write this as
where the double
dots above the variables denote the demeaned values.
This model can be estimated using OLS, but we need to make a degrees of freedom correction.
The Between Estimator
An alternative to this demeaning would be to simply run a cross-sectional regression on the
time-averaged values of the variables, which is known as the between estimator.
An advantage of running the regression on average values (the between estimator) over
running it on the demeaned values (the within estimator) is that the process of averaging is
likely to reduce the effect of measurement error in the variables on the estimation process.
You're Reading a Preview
Unlock to view full version