14.9 Confidence Intervals

This exercise is meant to demonstrate what the confidence level in a confidence interval represents. We will assume a standard normal population distribution and simulate what happens when we draw a sample and compute a confidence interval.

Your task is to complete the following function so that it,

  1. simulates and storesn draws from a standard normal distribution
  2. based on those draws, computes a valid confidence interval with confidence level \(\alpha\), a parameter that you pass to the function.

Your function should return a vector of length 2, containing the lower bound and upper bound of the confidence interval.

\[ CI_{\alpha} = \overline{X} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}} \]

where:

  • \(CI_\alpha\) is the confidence interval that you’re seeking to produce
  • \(\overline{X}\) is the sample average,
  • \(t_{\alpha/2}\) is your critical value (accessible through qt),
  • and \(s\) is your sample standard deviation. Notice that you’ll need each of these pieces in the code that you’re about to write.
Code
sim_conf_int <- function(n, alpha) {
  # Fill in your code to: 
  # 1. simulate n draws from a standard normal dist.
  # 2. compute a confidence interval with confidence level alpha
  
  sample_draws <- 'fill this in'
  sample_mean  <- 'fill this in'
  sample_sd    <- 'fill this in'
  
  critical_t <- 'fill this in'
  
  ci_95 <- 'fill this in'
  
  return(ci_95)  
}

sim_conf_int(n = 100, alpha = 0.25)
## [1] "fill this in"

When your function is complete, you can use the following code to run your function 100 times and plot the results.

Code
many_confidence_intervals <- function(num_simulations, n, alpha) {
  ## args: 
  ##  - num_simulations: the number of simulated confidence intervals 
  ##  - n: the number of observations in each simulation that will pass 
  ##           into your `sim_conf_int` function
  ##  - alpha: the confidence interval that you will pass into 
  ##           your `sim_conf_int` function
  
  results <- NULL
  for(i in 1:num_simulations) {
    interval = sim_conf_int(n, alpha)
    results = rbind(results, c(interval[1], interval[2], interval[1]<0 & interval[2]>0))
  }
  resultsdf = data.frame(results)
  names(resultsdf) = c("low", "high", "captured")
  return(resultsdf)
}

n = 20
confidence_intervals = many_confidence_intervals(100, n, .05)
Code
plot_many_confidence_intervals <- function(c) { 
  plot(NULL, type = "n",
       xlim = c(1,100), xlab = 'Trial',
       ylim = c(min(c$low), max(c$high)), ylab=expression(mu),pch=19)

  abline(h = 0, col = 'gray')
  abline(h = qt(0.975, n-1)/sqrt(n), lty = 2, col = 'gray')
  abline(h = qt(0.025, n-1)/sqrt(n), lty = 2, col = 'gray')
  
  points(c$high, col = 2+c$captured, pch = 20)
  points(c$low,  col = 2+c$captured, pch = 20)
  for(i in 1:nrow(c)) {
    lines(c(i,i), c(c$low[i],c$high[i]), col = 2+c$captured[i], pch = 19)
  }
  
  title(expression(paste("Simulation of t-Confidence Intervals for ", mu,
                          " with Sample Size 20")))

  legend(0,-.65, legend = c(expression(paste(mu," Captured")),
                             expression(paste(mu," Not Captured"))), fill = c(3,2))
  }
Code
# plot_many_confidence_intervals(confidence_intervals)
  1. How many of the simulated confidence intervals contain the true mean, zero?
  2. Suppose you run a single study. Based on what you’ve just written above, why is it incorrect to say that, “There is a 95% probability that the true mean is inside this (single) confidence interval”?