3.2 Class Announcements

Where have we come from, and where are we going?

3.2.1 What is in the rearview mirror?

Statisticians create a population model to represent the world; random variables are the building blocks of such a model.
We can describe the distribution of a random variable using:
- A CDF for all random variables
- A PMF for discrete random variables
- A PDF for continuous random variables
When we have multiple random variables,
- The joint PMF/PDF describes how they behave together
- The marginal PMF/PDF describes one variable in isolation
- The conditional PMF/PDF describes one variable given the value of another

3.2.2 Today’s Lesson

What might seem frustrating about this probability theory system of reasoning is that we are building a castle in the sky – a fiction. We’re supposing that there is some function that describes the probability that values are generated. In reality, there is no such generative function; it is extremely unlikely (though we’ll acknowledge that it is possible) that the physical reality we believe we exist within is just a complex simulation that has been programmed with functions by some unknown designer.

Especially frustrating is that we’re supposing this function, and then we’re further saying,

“If only we had this impossible function; and if only we also had the ability to take an impossible derivative of this impossible function, then we could…”

3.2.2.1 Single number summaries of a single random variable

But, here’s the upshot!

What we are doing today is laying the baseline for models that we will introduce next week. Here, we are going to suggest that there are radical simplifications that we can produce that hold specific guarantees, no matter how complex the function that we’re reasoning about.

In particular, in one specific usage of the term best we will prove that the Expectation operation is the best one-number summary of any distribution. To do so, we will define a term, variance, which is the squared deviations from the expectation of a variable that describes how “spread out” is a variable. Then, we will define a concept that is the mean squared error that is the square of the distance between a model prediction and a random variable’s realization. The key realization is that when the model predicts the expectation, then the MSE is equal to the variance of the random variable, which is the smallest possible value it could realize.

3.2.2.2 Single number summaries of relationships between random variables

Although the single number summaries are incredibly powerful, that’s not enough for today’s lesson! We’re also going to suggest that we can create a measure of linear dependence between two variables that we call the “covariance”, and a related, re-scaled version of this relationship that is called the correlation.

3.2.3 Future Attractions

A predictor is a function that provides a value for one variable, given values of some others.
Using our summary tools, we will define a predictor’s error and then minimize it.
This is a basis for linear regression