6.1 Solution of OLS
We rewrite the cost function as \[ \widehat{S}(\v{\beta})=\frac{1}{n}SSE(\v{\beta}), \] where \(SSE(\v{\beta})\stackrel{\text{def}}{=}\sum\limits_{i=1}^n(Y_i-\v{X_i}\v{\beta})^2\).
We now express \(SSE(\v{\beta})\) as a quadratic function of \(\v{\beta}\). \[ \begin{aligned} SSE &=\sum\limits_{i=1}^n(Y_i-\v{X_i}\v{\beta})^2 \\ &=\sum\limits_{i=1}^n Y_i^2 - 2\sum\limits_{i=1}^n Y_i(\v{X_i}\v{\beta}) + \sum\limits_{i=1}^n (\v{X_i}\v{\beta})^2 \\ &=\sum\limits_{i=1}^n Y_i^2 - 2\sum\limits_{i=1}^n Y_i(\v{\beta}^T\v{X_i}^T) + \sum\limits_{i=1}^n (\v{X_i}\v{\beta})(\v{X_i}\v{\beta}) \\ &=\sum\limits_{i=1}^n Y_i^2 - 2\sum\limits_{i=1}^n \v{\beta}^T(Y_i\v{X_i}^T) + \sum\limits_{i=1}^n (\v{\beta}^T\v{X_i}^T)(\v{X_i}\v{\beta}) \\ &=\left(\sum\limits_{i=1}^n Y_i^2\right) - 2\v{\beta}^T\left(\sum\limits_{i=1}^n\v{X_i}^TY_i\right) + \v{\beta}^T\left(\sum\limits_{i=1}^n \v{X_i}^T\v{X_i}\right)\v{\beta} \end{aligned} \] Taking partial derivative w.r.t. \(\beta_j\), we get \[ \frac{\partial}{\partial\beta_j}SSE(\v{\beta})=-2\left[\sum\limits_{i=1}^n\v{X_i}^TY_i\right]_j + 2\left[\left(\sum\limits_{i=1}^n \v{X_i}^T\v{X_i}\right)\v{\beta}\right]_j. \]
Therefore, \[ \frac{\partial}{\partial\v{\beta}}SSE(\v{\beta}) =-2\left(\sum\limits_{i=1}^n\v{X_i}^TY_i\right) + 2\left(\sum\limits_{i=1}^n \v{X_i}^T\v{X_i}\right)\v{\beta}. \]
In order to miniminize \(SSE(\v{\beta})\), a necessary condition for \(\widehat{\v{\beta}}\) is \[ \frac{\partial}{\partial\v{\beta}}SSE(\v{\beta})\bigg|_{\v{\beta} =\widehat{\v{\beta}}}=\v{0}, \] i.e., \[ -2\left(\sum\limits_{i=1}^n\v{X_i}^TY_i\right) + 2\left(\sum\limits_{i=1}^n \v{X_i}^T\v{X_i}\right)\widehat{\v{\beta}} =\v{0} \] So, \[\begin{equation} \left(\sum\limits_{i=1}^n\v{X_i}^TY_i\right) =\left(\sum\limits_{i=1}^n \v{X_i}^T\v{X_i}\right)\widehat{\v{\beta}} \tag{6.1} \end{equation}\]
Both the left and right hand side of the above equation are \(k+1\) vectors. So, we have a system of \((k+1)\) linear equations with \((k+1)\) unknowns—the elements of \(\v{\beta}\).
Let us define
\[ \widehat{\mathbb{Q}}_{\v{XX}} =\frac{1}{n}\left(\sum\limits_{i=1}^n\v{X_i}^T\v{X_i}\right) \mbox{ and } \widehat{\mathbb{Q}}_{\v{X}Y} =\frac{1}{n}\left(\sum\limits_{i=1}^n\v{X_i}^TY_i\right). \]
Rewriting (6.1), we get \[\begin{equation} \widehat{\mathbb{Q}}_{\v{X}Y}=\widehat{\mathbb{Q}}_{\v{XX}} \widehat{\v{\beta}}. \tag{6.2} \end{equation}\]
Equation (6.2) is sometimes referred to as the first-order moment condition. For the uniqueness of solution, we require that \(\widehat{\mathbb{Q}}_{\v{XX}}\) is non-singular. In that case, we can solve for \(\widehat{\v{\beta}}\) to get, \[ \widehat{\v{\beta}}=\left[\widehat{\mathbb{Q}}_{\v{XX}}\right]^{-1} \widehat{\mathbb{Q}}_{\v{X}Y}. \] To verify that the above choice minimizes \(SSE(\v{\beta})\), one can consider the second-order moment conditions. \[ \frac{\partial^2}{\partial\v{\beta}\partial\v{\beta}^T}SSE(\v{\beta}) =2\widehat{\mathbb{Q}}_{\v{XX}}. \] If \(\widehat{\mathbb{Q}}_{\v{XX}}\) is non-singular, it is also positive-definite. So, we have actually proved the following theorem.
Theorem 6.1 If \(\widehat{\mathbb{Q}}_{\v{XX}}\) is non-singular, then the least squares estimator is unique, and is given by \[ \widehat{\v{\beta}}=\left[\widehat{\mathbb{Q}}_{\v{XX}}\right]^{-1} \widehat{\mathbb{Q}}_{\v{X}Y}. \]