14.9 Confidence Intervals
This exercise is meant to demonstrate what the confidence level in a confidence interval represents. We will assume a standard normal population distribution and simulate what happens when we draw a sample and compute a confidence interval.
Your task is to complete the following function so that it,
- simulates and stores
n draws from a standard normal distribution
- based on those draws, computes a valid confidence interval with confidence level \(\alpha\), a parameter that you pass to the function.
Your function should return a vector of length 2, containing the lower bound and upper bound of the confidence interval.
\[ CI_{\alpha} = \overline{X} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}} \]
where:
- \(CI_\alpha\) is the confidence interval that you’re seeking to produce
- \(\overline{X}\) is the sample average,
- \(t_{\alpha/2}\) is your critical value (accessible through
qt
), - and \(s\) is your sample standard deviation. Notice that you’ll need each of these pieces in the code that you’re about to write.
Code
sim_conf_int <- function(n, alpha) {
# Fill in your code to:
# 1. simulate n draws from a standard normal dist.
# 2. compute a confidence interval with confidence level alpha
sample_draws <- 'fill this in'
sample_mean <- 'fill this in'
sample_sd <- 'fill this in'
critical_t <- 'fill this in'
ci_95 <- 'fill this in'
return(ci_95)
}
sim_conf_int(n = 100, alpha = 0.25)
## [1] "fill this in"
When your function is complete, you can use the following code to run your function 100 times and plot the results.
Code
many_confidence_intervals <- function(num_simulations, n, alpha) {
## args:
## - num_simulations: the number of simulated confidence intervals
## - n: the number of observations in each simulation that will pass
## into your `sim_conf_int` function
## - alpha: the confidence interval that you will pass into
## your `sim_conf_int` function
results <- NULL
for(i in 1:num_simulations) {
interval = sim_conf_int(n, alpha)
results = rbind(results, c(interval[1], interval[2], interval[1]<0 & interval[2]>0))
}
resultsdf = data.frame(results)
names(resultsdf) = c("low", "high", "captured")
return(resultsdf)
}
n = 20
confidence_intervals = many_confidence_intervals(100, n, .05)
Code
plot_many_confidence_intervals <- function(c) {
plot(NULL, type = "n",
xlim = c(1,100), xlab = 'Trial',
ylim = c(min(c$low), max(c$high)), ylab=expression(mu),pch=19)
abline(h = 0, col = 'gray')
abline(h = qt(0.975, n-1)/sqrt(n), lty = 2, col = 'gray')
abline(h = qt(0.025, n-1)/sqrt(n), lty = 2, col = 'gray')
points(c$high, col = 2+c$captured, pch = 20)
points(c$low, col = 2+c$captured, pch = 20)
for(i in 1:nrow(c)) {
lines(c(i,i), c(c$low[i],c$high[i]), col = 2+c$captured[i], pch = 19)
}
title(expression(paste("Simulation of t-Confidence Intervals for ", mu,
" with Sample Size 20")))
legend(0,-.65, legend = c(expression(paste(mu," Captured")),
expression(paste(mu," Not Captured"))), fill = c(3,2))
}
- How many of the simulated confidence intervals contain the true mean, zero?
- Suppose you run a single study. Based on what you’ve just written above, why is it incorrect to say that, “There is a 95% probability that the true mean is inside this (single) confidence interval”?