5.2 Key Terms and Assumptions

5.2.1 IID

We use an abbreviation for the sampling process that under girds our frequentist statistics. That abbreviation, IID, while short, contains two powerful requirements of our data sampling process.

Definition 5.1 IID sampling is:

Independent. The first I in the abbreviation, this independence requirement is similar to the independence concept that we’ve used in the probability theory section of the course. When samples are independent, the result of any one sample is not informative about the value of any of the other samples.
Identically Distributed. The ID in the abbreviation, this requirement means that all samples are drawn from the same distribution.

It might be tempting to imagine that IID samples are just “random samples”, but it is worth noting that IID sampling has the two specific requirements noted above, and that these requirements are more stringent than a “randomness” criteria.

When we are thinking about IID samples, and evaluating whether the sample do, in point of fact, meet both of the requirements, it is crucial to make an explicit statement about the reference population that is under consideration.

For example, suppose that you were interested in learning about life-satisfaction and your reference population are the peoples who live in the United States. Further, suppose that you decide to produce an estimate of this using a sample drawn from UC Berkeley undergraduate students during RRR week? There are several flaws in this design:

There is a key research design issue: a sample drawn from Berkeley undergraduates is going to be essentially uninformative of a US resident reference population!
There is a key statistical issue: the population of Berkeley undergraduates are not really an independent sample from the entire US resident reference population. Once you learn the age of someone from the Berkeley student population, you can make an conditional guess about the age of the next sample that will be closer than was possible before the first sample. The same goes for life-satisfaction: When you learn about the life-satisfaction from the first undergrad (who is miserable because they have their Stat 140 final coming up) while they are studying for their finals) you can make a conditionally better guess about the satisfaction of the next undergrad.

Notice that these violations of the IID requirements only arise because our reference population is the US resident population. If, instead, the reference population were “Berkeley undergrads” then the sampling process would satisfy the requirements of an IID process.

How, or why, can a change in the reference population make an identical sampling process move from one that we can consider IID to one that we cannot consider IID?

5.2.1.1 It this IID?

For each of the following scenarios, is the IID assumption plausible?

Call a random phone number. If someone answers, interview all persons in the household. Repeat until you have data on 100 people.
Call a random phone number, interview the person if they are over 30. Repeat until you have data on 100 people.
Record year-to-date price change for 20 largest car manufacturers.
Measure net exports per GDP for all 195 countries recognized by the UN.