13.8 Course Goals

13.8.1 Course Section III: Purpose-Driven Models

  • Statistical models are unknowing transformations of data

    • Because they’re built on the foundation of probability, we have certain guarantees what a model “says”
    • Because they’re unknowing, the models themselves know-not what they say.
  • As the data scientist, bring them alive to achieve our modeling goals

  • In Lab 2 we have expanded our ability to parse the world using regression, built a model that accomplishes our goals, and done so in a way that brings the ability to test under a “null” scenario

    • Key insight: regression is little more than conditional averages

13.8.2 Course Section II: Sampling Theory and Testing

  • Under very general assumptions, sample averages follow a predictable, known, distribution – the Gaussian distribution
  • This is true, even when the underlying probability distribution is very complex, or unknown!
  • Due to this common distribution, we can produce reliable, general tests!
  • In Lab 1 we computed simple statistics, and used guarantees from sampling theory to test whether these differences were likely to arise under a “null” scenario

13.8.3 Course Section I: Probability Theory

  • Probability theory
    • Underlies modeling and regression (Part III);
    • Underlies sampling, inference, and testing (Part II)
    • Every model built in every corner of data science

We can:

  • Model the complex world that we live in using probability theory;
  • Move from a probability density function that is defined in terms of a single variable, into a function that is defined in terms of many variables
  • Compute useful summaries – i.e. the BLP, expected value, and covariance – even with highly complex probability density functions.

13.8.4 Statistics as a Foundation for MIDS

  • In w203, we hope to have laid a foundation in probability that can be used not only in statistical applications, but also in every other machine learning application that are likely to ever encounter