14.9 Confidence Intervals

This exercise is meant to demonstrate what the confidence level in a confidence interval represents. We will assume a standard normal population distribution and simulate what happens when we draw a sample and compute a confidence interval.

Your task is to complete the following function so that it,

simulates and storesn draws from a standard normal distribution
based on those draws, computes a valid confidence interval with confidence level \(\alpha\), a parameter that you pass to the function.

Your function should return a vector of length 2, containing the lower bound and upper bound of the confidence interval.

\[ CI_{\alpha} = \overline{X} \pm t_{\alpha/2} \cdot \frac{s}{\sqrt{n}} \]

where:

\(CI_\alpha\) is the confidence interval that you’re seeking to produce
\(\overline{X}\) is the sample average,
\(t_{\alpha/2}\) is your critical value (accessible through qt),
and \(s\) is your sample standard deviation. Notice that you’ll need each of these pieces in the code that you’re about to write.

Code

sim_conf_int <- function(n, alpha) {
  # Fill in your code to: 
  # 1. simulate n draws from a standard normal dist.
  # 2. compute a confidence interval with confidence level alpha
  
  sample_draws <- 'fill this in'
  sample_mean  <- 'fill this in'
  sample_sd    <- 'fill this in'
  
  critical_t <- 'fill this in'
  
  ci_95 <- 'fill this in'
  
  return(ci_95)  
}

sim_conf_int(n = 100, alpha = 0.25)

## [1] "fill this in"

When your function is complete, you can use the following code to run your function 100 times and plot the results.

Code

many_confidence_intervals <- function(num_simulations, n, alpha) {
  ## args: 
  ##  - num_simulations: the number of simulated confidence intervals 
  ##  - n: the number of observations in each simulation that will pass 
  ##           into your `sim_conf_int` function
  ##  - alpha: the confidence interval that you will pass into 
  ##           your `sim_conf_int` function
  
  results <- NULL
  for(i in 1:num_simulations) {
    interval = sim_conf_int(n, alpha)
    results = rbind(results, c(interval[1], interval[2], interval[1]<0 & interval[2]>0))
  }
  resultsdf = data.frame(results)
  names(resultsdf) = c("low", "high", "captured")
  return(resultsdf)
}

n = 20
confidence_intervals = many_confidence_intervals(100, n, .05)

Code

plot_many_confidence_intervals <- function(c) { 
  plot(NULL, type = "n",
       xlim = c(1,100), xlab = 'Trial',
       ylim = c(min(c$low), max(c$high)), ylab=expression(mu),pch=19)

  abline(h = 0, col = 'gray')
  abline(h = qt(0.975, n-1)/sqrt(n), lty = 2, col = 'gray')
  abline(h = qt(0.025, n-1)/sqrt(n), lty = 2, col = 'gray')
  
  points(c$high, col = 2+c$captured, pch = 20)
  points(c$low,  col = 2+c$captured, pch = 20)
  for(i in 1:nrow(c)) {
    lines(c(i,i), c(c$low[i],c$high[i]), col = 2+c$captured[i], pch = 19)
  }
  
  title(expression(paste("Simulation of t-Confidence Intervals for ", mu,
                          " with Sample Size 20")))

  legend(0,-.65, legend = c(expression(paste(mu," Captured")),
                             expression(paste(mu," Not Captured"))), fill = c(3,2))
  }

Code

# plot_many_confidence_intervals(confidence_intervals)

How many of the simulated confidence intervals contain the true mean, zero?
Suppose you run a single study. Based on what you’ve just written above, why is it incorrect to say that, “There is a 95% probability that the true mean is inside this (single) confidence interval”?