13.8 Course Goals

Statistical models are unknowing transformations of data
- Because they’re built on the foundation of probability, we have certain guarantees what a model “says”
- Because they’re unknowing, the models themselves know-not what they say.
As the data scientist, bring them alive to achieve our modeling goals
In Lab 2 we have expanded our ability to parse the world using regression, built a model that accomplishes our goals, and done so in a way that brings the ability to test under a “null” scenario
- Key insight: regression is little more than conditional averages

Under very general assumptions, sample averages follow a predictable, known, distribution – the Gaussian distribution
This is true, even when the underlying probability distribution is very complex, or unknown!
Due to this common distribution, we can produce reliable, general tests!
In Lab 1 we computed simple statistics, and used guarantees from sampling theory to test whether these differences were likely to arise under a “null” scenario

Probability theory
- Underlies modeling and regression (Part III);
- Underlies sampling, inference, and testing (Part II)
- Every model built in every corner of data science

We can:

Model the complex world that we live in using probability theory;
Move from a probability density function that is defined in terms of a single variable, into a function that is defined in terms of many variables
Compute useful summaries – i.e. the BLP, expected value, and covariance – even with highly complex probability density functions.

In w203, we hope to have laid a foundation in probability that can be used not only in statistical applications, but also in every other machine learning application that are likely to ever encounter