Unit 5 Learning from Random Samples
This week, we’re coming into the big turn in the class, from probability theory to sampling theory.
In the probability theory section of the course, we developed the theoretically best possible set of models. Namely, we said that if our goal is to produce a model that minimizes the Mean Squared Error that expectation and conditional expectation are as good as it gets. That is, if we only have the outcome series, \(Y\), we cannot possibly improve upon \(E[Y]\), the expectation of the random variable \(Y\). If we have additional data on hand, say \(X\) and \(Y\), then the best model of \(Y\) given \(X\) is the conditional expectation, \(E[Y|X]\).
We have also said that because this conditional expectation function might be complex, and hard to inform with data, that we might also be interested in a principled simplification of the conditional expectation function – the simplification that requires our model be a line.
With this simplification in mind, we derived the linear system that produces the minimum MSE: the ratio of covariance between variables to variance of the predictor:
\[ \beta_{BLP} = \frac{Cov[Y,X]}{V[X]}. \]
We noted, quickly, that the simple case of only two variables – an outcome and a single predictor – generalizes nicely into the (potentially very) many dimensional case. If the many-dimensional \(BLP\) is denoted as \(g(\mathbf{X}) = b_{0} + b_{1}X_{1} + \dots + b_{k}X_{k}\), then we can arrive at the sloped between one particular predictor, \(X_{k}\), and the outcome, \(Y\), as:
\[ b_{k} = \frac{\partial g(\mathbf{X})}{\partial X_{k}}. \]