8.9 Regression Plots and Discussion
In this next set of notes, we’re going to give some data, displayed in plots, and we will try to apply what we have learned in the async and reading for this week to answer questions about each of the scatter plots.
8.9.1 Plot 1
Consider data that is generated according to the following function:
\[ Y = 1 + 2x_1 + 3x_2 + e, \]
where \(x_1 \sim N(0,2)\), \(x_2 \sim N(0,2)\) and \(e\) is a constant equal to zero.
From this population, you might consider taking a sample of 100 observations, and representing this data in the following 3d scatter plot. In this plot, there are three dimensions, an \(x_1, x_2\), and \(y\) dimensions.
- Rotate the cube and explore the data, looking at each face of the cube, including from the top down.
- One of the lessons that we learned during the random variables section of the course is that every random variable that has been measured can also be marginalized off. You might think of this as “casting down” data from three dimensions, to only two.
- Sketch the following 2d scatter plots, taking care the label your axes. You need not represent all 100 points, but rather create the gestalt of what you see.
1. \(Y = f(x_1)\) (but not \(x_2\)) 2. \(Y = f(x_2)\) (but not \(x_1\)) 3. \(x2 = f(x_1)\) - Once you have sketched the scatter plots, what line would you fit that minimizes the sum of squared residuals in the vertical direction. Define a residual, \(\epsilon\), to be the vertical distance between the line you draw, and the corresponding point on the input data.
- What is the average of the residuals for each of the lines that you have fitted? How does this correspond to the moment conditions discussed in the async? What would happen if you translated this line vertically?
- Rotate the cube so that the points “fall into line”. When you see this line, how does it help you describe the function that governs this data?