Unit 3 Summarizing Distributions

a majestic valley

In the last live session, we introduced random variables; probability density and cumulative density; and, made the connection between joint, marginal, and conditional distributions. All of these concepts work with the entire distribution.

Take, for example, the idea of conditional probability. We noted that conditional probability is defined to be:

\[ f_{Y|X}(y|x) = \frac{f_{Y,X}(y,x)}{f_{X}(x)} \]

This is a powerful concept that shows a lot of the range of the reasoning system that we’ve built to this point! The probability distribution of \(Y\) might change as a result of changes in \(X\). If you unpack that just a little bit more, we might say that \(f_{Y|X}(y|x)\) – the probability density of \(Y\) – which is itself a function, is also a function of \(X\). To say it again, to be very explicit: the function is a function of another input. That might sound wild, but it is all perfectly consistent with the world that we’ve built to this point.

This concept is very expressive. Knowing \(f_{Y}(y)\) gives a full information representation of a variable; knowing \(f_{Y|X}(y|x)\) lets you update that information to make an even more informative statement about \(Y\). In Foundations and at this point in the class, we deal only with conditional probability conditioning on a single variable, but the process generalizes.

For example, if there were four random variables, \(A, B, C, D\), we could make a statement about \(A\) that conditions on \(B, C, D\):

\[ f_{A|\{B,C,D\}}(a|\{b,c,d\}) = \frac{f_{A,B,C,D}(a,b,c,d)}{f_{B,C,D}(b,c,d)} \]

In this week’s materials we are going to go in the opposite direction: Rather than producing a very expressive system of probabilities, we’re going to attempt to summarize all of the information contained in a pdf into lower-dimensional representations. Our first task will be summarizing a single random variable in two ways:

Where is the “center” of the random variable; and,
How dispersed, “on average” is the random variable from this center.

After developing the concepts of expectation and variance (which are 1 & 2 above, respectively), we will develop a summary of a joint distribution: the covariance. The particular definitions that we choose to call expectation, variance, and covariance require justification. Why should we use these particular formulae as measures of the “center” and “dispersion”?

We ground these summaries in the Mean Squared Error evaluative metric, as well as justifying this metric.