4.3 Best Linear Predictor

Let \(Y\) be a random variable and \({\boldsymbol{X}}\) be a random vector of \(k\) variables. We denote the best linear predictor of \(Y\) given \({\boldsymbol{X}}\) by \(\mathscr{P}[Y|{\boldsymbol{X}}]\). It’s also called the linear projection of \(Y\) on \({\boldsymbol{X}}\).

Theorem 4.2 (Best Linear Predictor) Under the following assumptions

\({\mathbb{E}\left[ Y^2 \right]}<\infty\)
\({\mathbb{E}\left[ ||\bf{X}||^2 \right]}<\infty\)
\({\mathbb{Q}}_{\bf{XX}}\stackrel{\text{def}}{=}{\mathbb{E}\left[ {\boldsymbol{X}}^T{\boldsymbol{X}} \right]}\) is positive-definite

the best linear predictor exists uniquely, and has the form \[ \mathscr{P}[Y|{\boldsymbol{X}}]={\boldsymbol{X}}{\boldsymbol{\beta}}, \] where \({\boldsymbol{\beta}}=\left({\mathbb{E}\left[ {\boldsymbol{X}}^T{\boldsymbol{X}} \right]}\right)^{-1}{\mathbb{E}}[{\boldsymbol{X}}^TY]\) is a column vector.

In the following theorem, we show that the BLP error is uncorrelated to the explanatory variables.

Theorem 4.3 (Best Linear Predictor Error) If the BLP exists, the linear projection error \(\varepsilon=Y-\mathscr{P}[Y|{\boldsymbol{X}}]\) follows the following properties:

\({\mathbb{E}}[{\boldsymbol{X}}^T\varepsilon]={\boldsymbol{0}}\)
moreover, \({\mathbb{E}}[\varepsilon]=0\) if \({\boldsymbol{X}}=\begin{bmatrix}1 & X_{[1]} & \ldots & X_{[k]} \end{bmatrix}\) contains a constant.