Unit 6 Hypothesis Testing

more lake

Frequentist Hypothesis testing is a very well established framework in the applied practice, and scientific literature. Sometimes (often, currently) referred to as Null Hypothesis Significance Testing (NHST), this framework essentially makes an absurd assertion and asks the data to overturn that assertion.

Like a petulant child, NHST essentially proclaims,

“If you really loved me, you would let me watch this screen one-hundred hours every day.”

Here the absurdity is that a parent might not love their child, and the criteria to overturn that assertion is noted to be “buy me an iPad”’.

What is Frequentist testing doing?

This testing framework works on samples of data, and applies estimators to produce estimates of population parameters that are fundamentally unknown and unknowable. Despite this unknown and unknowable population target, with some carefully written down estimators we can rely on the convergence characteristics of some estimators to produce useful, reliable results.

We begin with the one-sample t-test. The one-sample t-test relies on the sample average as an estimator of a population expectation. In doing so, it relies on the effectiveness of the Weak Law of Large Numbers and the Central Limit Theorem to guarantee that the estimator that converges in probability to the population expectation, while also converging in distribution to a Gaussian distribution.

These two convergence concepts permit a data scientist to make several inferences based on data:

The probability of generating data that “looks like what is observed”, if the null-hypothesis were true. This is often referred to as the p-value of a test, and is the petulant statement identified above.
An interval of values that, with some stated probability (e.g. 95%), contains the true population parameter.

This framework begins a exceedingly important task that we must understand, and undertake when we are working as data scientists: Producing our best-estimate, communicating how we arrived at that estimate, what (if any) guarantees that estimate provides, and crucially all limitations of our estimate.