Suppose wehavenobservations, eachwith values for pvariables. Wedenote the value of
variable jin observation ibyxij,and the vector of all values for observationibyxi.
Weoften view the observed xias arandomsample ofrealizations of arandomvector X
with some (unknown) distribution.
The is potential ambiguitybetween the notation xifor observation i,and the notation xj
for arealization of the randomvariable Xj.(The textbook uses bold face forxi.)
Iwill (try to) reserveifor indexing observations, and usejand kfor indexing variables,
but the textbook somtimes uses ito index avariable.
The sample mean of variable jis ¯xj=1
The sample mean vector is ¯x=[¯x1,...,¯xp]′.
If the observations all havethe same distribution, the sample meanvector, ¯x,is an unbiased
estimate of the meanvector, µ,of the distribution from whichthese observations came.
The sample varianceof variable jis s2
If the observations all havethe same distribution, the sample variance, s2
of the variance, σ2
j,of the distribution for Xj,and will bean unbiasedestimate if the
observations are independent.
Sample covariance and correlation:
The sample covariance ofvariable jwith variable kis 1
(xij −¯xj)(xik −¯xk).
The sample covariance is denoted bysjk.Note that sjj equals s2
j,the sample variance of
The sample correlation of variable jwith variable kis sjk/(sjsk), often denoted byrjk.
Sample covariance and correlation matrices:
The sample covariances maybearranged as the sample covariance matrix:
s11 s12 · · · s1p
s21 s22 · · · s2p
sp1sp2· · · spp
The sample covariance matrix can also becomputed asS=1
Similarly,the sample correlations maybearranged as the samplecorrelation matrix, some-
times denoted R(though the textbookalso uses Rfor the population correlationmatrix).