8.3 Covariance Matrix Estimation

We now turn our attention to the estimation of the sandwich matrix using a finite sample.

8.3.1 Heteroskedastic Variance

Theorem 8.1 presented the asymptotic covariance matrix of \(\sqrt{n}(\widehat{{\boldsymbol{\beta}}}-{\boldsymbol{\beta}})\) is \[{\mathbb{V}}_{{\boldsymbol{\beta}}} ={\mathbb{Q}}_{{\boldsymbol{XX}}}^{-1}{\mathbb{A}}{\mathbb{Q}}_{{\boldsymbol{XX}}}^{-1}.\] Without imposing any homoskedasticity condition, we estimate \({\mathbb{V}}_{{\boldsymbol{\beta}}}\) using a plug-in estimator.

We have already seen that \(\widehat{{\mathbb{Q}}}_{{\boldsymbol{XX}}}=\frac{1}{n}\sum\limits_{i=1}^n{\boldsymbol{X}}_i^T{\boldsymbol{X}}_i\) is a natural estimator for \({\mathbb{Q}}_{{\boldsymbol{XX}}}\). For \({\mathbb{A}}\), we use the moment estimator \[ \widehat{{\mathbb{A}}}=\frac{1}{n}\sum\limits_{i=1}^n{\boldsymbol{X}}_i^T{\boldsymbol{X}}_ie_i^2, \] where \(e_i=(Y_i-{\boldsymbol{X}}_i\widehat{{\boldsymbol{\beta}}})\) is the \(i\)-th residual. As it turns out, \(\widehat{{\mathbb{A}}}\) is a consistent estimator for \({\mathbb{A}}\).

As a result, we get the following plug-in estimator for \({\mathbb{V}}_{{\boldsymbol{\beta}}}\): \[ \widehat{{\mathbb{V}}}_{{\boldsymbol{\beta}}}= \widehat{{\mathbb{Q}}}_{{\boldsymbol{XX}}}^{-1}\widehat{{\mathbb{A}}}\widehat{{\mathbb{Q}}}_{{\boldsymbol{XX}}}^{-1} \] The estimator is also consistent. For a proof, see Hensen 2013.

As a consequence, we can get the following estimator for the variance, \({\mathbb{V}}_{\widehat{{\boldsymbol{\beta}}}}\), of \(\widehat{{\boldsymbol{\beta}}}\) in the heteroskedastic case. \[ \begin{aligned} \widehat{{\mathbb{V}}}\left[\widehat{{\boldsymbol{\beta}}}\right] &=\frac{1}{n}\widehat{{\mathbb{V}}}_{{\boldsymbol{\beta}}}^{\text{HC0}} \\ &=\frac{1}{n}\widehat{{\mathbb{Q}}}_{{\boldsymbol{XX}}}^{-1}\widehat{{\mathbb{A}}}\widehat{{\mathbb{Q}}}_{{\boldsymbol{XX}}}^{-1} \\ &=\frac{1}{n}\left(\frac{1}{n}\sum\limits_{i=1}^n{\boldsymbol{X}}_i^T{\boldsymbol{X}}_i\right)^{-1} \left(\frac{1}{n}\sum\limits_{i=1}^ne_i^2{\boldsymbol{X}}_i^T{\boldsymbol{X}}_i\right) \left(\frac{1}{n}\sum\limits_{i=1}^n{\boldsymbol{X}}_i^T{\boldsymbol{X}}_i\right)^{-1} \\ &=\left({\mathbb{X}}^T{\mathbb{X}}\right)^{-1} {\mathbb{X}}^T{\mathbb{D}}{\mathbb{X}} \left({\mathbb{X}}^T{\mathbb{X}}\right)^{-1} \end{aligned} \] where \({\mathbb{D}}\) is an \(n\times n\) diagonal matrix with diagonal entries \(e_1^2,e_2^2,\ldots,e_n^2\). The estimator, \(\widehat{{\mathbb{V}}}\left[\widehat{{\boldsymbol{\beta}}}\right]\), is referred to as the robust error variance estimator for the OLS coefficients \(\widehat{{\boldsymbol{\beta}}}\).

8.3.2 Homeskedastic Variance