13.8 Course Goals
13.8.1 Course Section III: Purpose-Driven Models
Statistical models are unknowing transformations of data
- Because they’re built on the foundation of probability, we have certain guarantees what a model “says”
- Because they’re unknowing, the models themselves know-not what they say.
As the data scientist, bring them alive to achieve our modeling goals
In Lab 2 we have expanded our ability to parse the world using regression, built a model that accomplishes our goals, and done so in a way that brings the ability to test under a “null” scenario
- Key insight: regression is little more than conditional averages
13.8.2 Course Section II: Sampling Theory and Testing
- Under very general assumptions, sample averages follow a predictable, known, distribution – the Gaussian distribution
- This is true, even when the underlying probability distribution is very complex, or unknown!
- Due to this common distribution, we can produce reliable, general tests!
- In Lab 1 we computed simple statistics, and used guarantees from sampling theory to test whether these differences were likely to arise under a “null” scenario
13.8.3 Course Section I: Probability Theory
- Probability theory
- Underlies modeling and regression (Part III);
- Underlies sampling, inference, and testing (Part II)
- Every model built in every corner of data science
We can:
- Model the complex world that we live in using probability theory;
- Move from a probability density function that is defined in terms of a single variable, into a function that is defined in terms of many variables
- Compute useful summaries – i.e. the BLP, expected value, and covariance – even with highly complex probability density functions.