2.2 Introduction to the Materirals

From the axioms of probability, it is possible to build a whole, expressive modeling system (that need not be grounded at all in the minutia of the world). With this probability model in place, we can describe how frequently events in the random variable will occur. When variable are dependent upon each other, we can utilize information that is encoded in this dependence in order to make predictions that are closer to the truth than predictions made without this information.

There is both a beauty and a tragedy when reasoning about random variables: we describe random variables using their joint density function.

The beauty is that by reasoning with such general objects – the definitions that we create, and the theorems that we derive in this section of the course – produce guarantees that hold in every case, no matter the function that stands in for the joint density function. We will compute several examples of specific functions to provide a chance to reason about these objects and how they “work”.

The tragedy is that in the “real world”, the world where we are going to eventually going to train and deploy our models, we are never provided with this joint density function. Perhaps this is the creation myth for probability theory: in a perfect world, we can produce a perfect result. But, in the “fallen” world of data, we will only be able to produce approximations.