5.4 Estimator Property: Biased or Unbiased?

  1. First, for a general case: Suppose that you have chosen some particular estimator, \(\hat{\theta}\) to estimate some characteristic, \(\theta\) of a random variable. How do you know if this estimator is unbiased?
  2. Second, for a specific case: Define the “sample average” to be the following: \(\frac{1}{n}\sum_{i=1}^{N} x_{i}\). Prove that this sample average estimator is an unbiased estimator of \(E[X]\).
  3. Third (easier), for a different specific case: Define the “smample smaverage” to be the following \(\frac{1}{n^2}\sum_{i=1}^{N} x_{i}\). Prove that the smample smaverage is a biased estimator of \(E[X]\).
  4. Fourth (harder): Define the geometric mean to be \[\left(\prod_{i=1}^{N}x_{i}\right)^{\frac{1}{N}}\]. Prove that the geometric mean is a biased estimator of \(E[X]\).

5.4.1 Is it unbiased, with data?

Suppose that you’re getting data from the following process:

Code
random_distribution <- function(number_samples) { 
  
  d1 <- c(1.0, 2.0)
  d2 <- c(1.1, 2.1)
  d3 <- c(1.5, 2.5)

  distribution_chooser = sample(x=1:3, size=1)
  
  if(distribution_chooser == 1) { 
    x_ <- runif(n=number_samples, min=d1[1], max=d1[2])  
  } else if(distribution_chooser == 2) { 
    x_ <- runif(n=number_samples, min=d2[1], max=d2[2]) 
  } else if(distribution_chooser == 3) { 
    x_ <- runif(n=number_samples, min=d3[1], max=d3[2])
  }
  return(x_)
}

random_distribution(number_samples=10)
##  [1] 2.467197 2.366916 1.937715 1.691938 1.582294 2.083452 1.570361 2.027663
##  [9] 1.972288 1.548191
Code
mean(random_distribution(number_samples=10000))
## [1] 1.501582

Notice that, there are two forms of inherent uncertainty in this function:

  1. There is uncertainty about the distribution that we are getting draws from; and,
  2. Within a distribution, we’re getting draws at random from a population distribution.

This class of function, the r* functions, are the implementation of random generative processes within the R language. Look into ?distributions as a class to see more about this process.

Suppose that you chose to use the same sample average estimator as a means of producing an estimate of the population expected value, \(E[X]\). Suppose that you get the following draws:

Code
draws <- random_distribution(number_samples=10)
draws
##  [1] 1.946123 1.913084 1.711113 1.768129 1.573630 1.916366 1.149307 1.556135
##  [9] 1.358160 1.889717
Code
mean(draws)
## [1] 1.678176

Is this sample average an unbiased estimator for the population expected value? How do you know?