8.6 The Regression Anatomy Formula

We make the claim in live session that we can re-represent a coefficient that we’re interested in as a function of all the other variable in a regression. That is, suppose that we were interested, initially, in estimating the model:

\[ Y = \hat\beta_{0} + \hat\beta_{1} X_{1} + \hat\beta_{2} X_{2} + \hat\beta_{3}X_{3} + e \] that we can produce an estimate for \(\hat\beta_{1}\) by fitting this auxiliary regression,

\[ X_{1} = \hat\delta_{0} + \hat\delta_2X_2 + \hat\delta_3X_3 + r_{1} \]

And then using the residuals, noted as \(r_1\) above, in a second auxiliary regression,

\[ Y = \gamma_0 + \gamma_1 r_1 \]

The claim that we make in the live session is that there is a guarantee that \(\beta_1 = \gamma_1\). Here, we are first going to show that this is true, and then we’re going to reason about what this means, and why this feature is interesting (or at least useful) when we are estimating a regression.

Suppose that the population model is the following:

\[ X_{1} = \begin{cases} \frac{1}{10}, & 0 \leq x \leq 10, \\ 0, & otherwise \end{cases}, \\ X_{2} = \begin{cases} \frac{1}{10}, & 0 \leq x \leq 10, \\ 0, & otherwise \end{cases}, \\ X_{3} = \begin{cases} \frac{1}{10}, & 0 \leq x \leq 10, \\ 0, & otherwise \end{cases} \] And, furthermore suppose that \(Y = g(X_{1}, X_{2}, X_{3}\), specifically, that:

\[ Y = -3 + (1\cdot X_1) + (2\cdot X_2) + (3\cdot X_3) \] Then, because we know the population model, we can produce a single sample from it using the following code:

Code
d <- data.frame(
  x1 = runif(n = 100, min = 0, max = 10), 
  x2 = runif(n = 100, min = 0, max = 10), 
  x3 = runif(n = 100, min = 0, max = 10)) %>% 
  mutate(y = -3 + 1*x1 + 2*x2 + 3*x3 + rnorm(n = n(), mean = 0, sd = 1))

head(d)
##         x1        x2       x3        y
## 1 2.574453 2.0164855 5.864455 22.22699
## 2 3.533997 0.6302335 6.119277 19.27601
## 3 1.730922 1.9090746 3.126370 10.88401
## 4 5.977906 1.6324287 7.661009 29.98035
## 5 2.935329 3.9556559 8.179117 32.75561
## 6 4.530804 8.9397239 3.170691 25.86899

Notice that when we made this data, we included a set of random noise at the end. The idea here is that there are other “things” in this universe that also affect \(Y\), but that we don’t have access to them. By assumption, what we have measured in this world, \(X_1, X_2, X_3\) are uncorrelated with these other features.

8.6.1 Estimate an OLS Regression

Let’s begin by producing an estimate of the OLS regression of Y on these X variables. Notice the way that we’re talking about this:

We are going to regress Y on \(X_{1}\), \(X_{2}\), and \(X_{3}\).

Code
model_main <- lm(y ~ x1 + x2 + x3, data = d)
coef(model_main)
## (Intercept)          x1          x2          x3 
##   -3.081193    1.010980    2.014148    2.986416

8.6.2 Regression Anatomy and Fritch Waugh Lovell

The claim is that we can produce an estimate of \(\hat{\beta}_1\) using an auxiliary set of regression estimates, and then using the regression from that auxiliary regression.

Code
model_aux <- lm(x1 ~ x2 + x3, data = d)

If we look into the structure of model_aux we can see that there are a ton of pieces in here.

Code
str(model_aux)
## List of 12
##  $ coefficients : Named num [1:3] 4.9843 -0.1222 0.0528
##   ..- attr(*, "names")= chr [1:3] "(Intercept)" "x2" "x3"
##  $ residuals    : Named num [1:100] -2.473 -1.696 -3.185 0.788 -1.998 ...
##   ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:100] -46.738 -3.391 1.375 0.934 -2.047 ...
##   ..- attr(*, "names")= chr [1:100] "(Intercept)" "x2" "x3" "" ...
##  $ rank         : int 3
##  $ fitted.values: Named num [1:100] 5.05 5.23 4.92 5.19 4.93 ...
##   ..- attr(*, "names")= chr [1:100] "1" "2" "3" "4" ...
##  $ assign       : int [1:3] 0 1 2
##  $ qr           :List of 5
##   ..$ qr   : num [1:100, 1:3] -10 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:100] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:3] "(Intercept)" "x2" "x3"
##   .. ..- attr(*, "assign")= int [1:3] 0 1 2
##   ..$ qraux: num [1:3] 1.1 1.15 1.09
##   ..$ pivot: int [1:3] 1 2 3
##   ..$ tol  : num 1e-07
##   ..$ rank : int 3
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 97
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = x1 ~ x2 + x3, data = d)
##  $ terms        :Classes 'terms', 'formula'  language x1 ~ x2 + x3
##   .. ..- attr(*, "variables")= language list(x1, x2, x3)
##   .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:3] "x1" "x2" "x3"
##   .. .. .. ..$ : chr [1:2] "x2" "x3"
##   .. ..- attr(*, "term.labels")= chr [1:2] "x2" "x3"
##   .. ..- attr(*, "order")= int [1:2] 1 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(x1, x2, x3)
##   .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:3] "x1" "x2" "x3"
##  $ model        :'data.frame':   100 obs. of  3 variables:
##   ..$ x1: num [1:100] 2.57 3.53 1.73 5.98 2.94 ...
##   ..$ x2: num [1:100] 2.02 0.63 1.91 1.63 3.96 ...
##   ..$ x3: num [1:100] 5.86 6.12 3.13 7.66 8.18 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language x1 ~ x2 + x3
##   .. .. ..- attr(*, "variables")= language list(x1, x2, x3)
##   .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:3] "x1" "x2" "x3"
##   .. .. .. .. ..$ : chr [1:2] "x2" "x3"
##   .. .. ..- attr(*, "term.labels")= chr [1:2] "x2" "x3"
##   .. .. ..- attr(*, "order")= int [1:2] 1 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(x1, x2, x3)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:3] "x1" "x2" "x3"
##  - attr(*, "class")= chr "lm"

To evaluate our claim, we need to find the residuals from this regression. As a knowledge check, what is it that we mean when we say, “residual” in this sense?

To make talking about these easier, here is a plot that might be useful.

Code
d %>% 
  ggplot() + 
  aes(x = x1, y = y) + 
  geom_point() + 
  geom_segment(aes(x = 0, xend = 10, y = 0, yend = 50), color = 'steelblue')

In order to access these residuals, we can “augment” the dataframe that we used in the model, using the broom::augment function call.

Code
model_aux_augmented <- augment(model_aux)

Because the \(Y\) variable wasn’t included in the regression for model_aux we have to bring of over from the main dataset, which is a little bit… um, hacky. Forgive these sins.

Code
model_aux_augmented$y <- d$y
model_aux_augmented
## # A tibble: 100 × 10
##       x1    x2    x3 .fitted .resid   .hat .sigma  .cooksd .std.resid     y
##    <dbl> <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>    <dbl>      <dbl> <dbl>
##  1 2.57  2.02   5.86    5.05 -2.47  0.0208   2.76 0.00582      -0.906  22.2
##  2 3.53  0.630  6.12    5.23 -1.70  0.0344   2.77 0.00466      -0.626  19.3
##  3 1.73  1.91   3.13    4.92 -3.19  0.0286   2.75 0.0135       -1.17   10.9
##  4 5.98  1.63   7.66    5.19  0.788 0.0312   2.77 0.000906      0.290  30.0
##  5 2.94  3.96   8.18    4.93 -2.00  0.0246   2.77 0.00451      -0.733  32.8
##  6 4.53  8.94   3.17    4.06  0.471 0.0383   2.77 0.000402      0.174  25.9
##  7 0.811 8.74   2.40    4.04 -3.23  0.0401   2.75 0.0199       -1.20   20.1
##  8 0.517 8.67   4.13    4.14 -3.63  0.0326   2.75 0.0200       -1.34   29.2
##  9 4.34  1.34   8.09    5.25 -0.908 0.0369   2.77 0.00143      -0.335  28.8
## 10 8.17  7.79   6.39    4.37  3.80  0.0276   2.75 0.0184        1.40   39.7
## # ℹ 90 more rows

Finally, with this augmented data that has information from the model, we can estimate the model that includes only the residuals as predictors of \(Y\).

Code
model_two <- lm(y ~ .resid, data = model_aux_augmented)
coef(model_two)
## (Intercept)      .resid 
##    26.21733     1.01098

Our claim was that the coefficients from model_main and model_two should be the same.

Code
test_that(
  'the model coefficients are equal', 
  expect_equal(
    as.numeric(coef(model_main)['x1']), 
    as.numeric(coef(model_two)['.resid']))
)
## Test passed 😀

But, why is this an interesting, or at least useful, feature to appreciate?

This is actually a really famous, relatively recently “rediscovered” proof. If we have a number of variables, one called an outcome and the rest called features, then we can estimate the relationship between the outcome and one feature in the following way:

  1. Estimate the relationship between all the other features and the one that we’re examining; save the information about the feature we’re examining which cannot be explained by the other features in some vector.
  2. Regress the outcome on this leftover information.

Slightly more of what is happening? In the first model, the leftover information is orthogonal to the information posessed in the regression features. In the second model, we can use this orthagonal information to estimate the effect of one variable.