This page was written so I could practise using LaTeX/TeX on Github Pages. Thanks to Vincent Tam's page for showing me how to do it!

Question

Suppose we have some data $x_1, x_2, \dots, x_n$ and $y_1, y_2, \dots y_n$ ($x_i, y_i \in \mathbb{R}$). Then we can calculate the (ordinary least squares) regression line of $y$ on $x$. Let's say that we get $y=mx+c$, and rearrange to get $x=\frac{1}{m}y - \frac{c}{m}$. Is this the regression line of $x$ on $y$?

Answer

In general, no. For simplicity assume that $\sum x_i = \sum y_i = 0$. One can show that regression lines always pass through the mean point $(\frac{\sum x_i}{n}, \frac{\sum y_i}{n})=(0,0)$. So the regression line of $y$ on $x$ must be of the form $y = m x$ for some $m \in \mathbb{R}$. We get \[ m = \text{argmin}_a(\sum_i (y_i - a x_{i})^2) = \frac{\sum x_i y_i}{\sum x_i x_i} = \frac{\sum x_i y_i}{\sqrt{ \sum x_i x_i} \sqrt{ \sum y_i y_i}} \sqrt{\frac{\sum y_i y_i}{\sum x_i x_i}} =r(x,y) \sqrt{\frac{\sum y_i y_i}{\sum x_i x_i}} \] Swapping $x$ and $y$ around we get the regression line for $x$ on $y$ in the form $x=m' y$, where \[ m' = r(y,x) \sqrt{\frac{\sum x_i x_i}{\sum y_i y_i}} \] Since the correlation $r(x,y)$ is symmetric in $x$ and $y$, a necessary and sufficient condition for $m'$ to be the reciprocal of $m$ is that $r(x,y) = \pm 1$, which in general does not hold.

To Think About

What is wrong with the following argument for why $m'$ and $m$ should always be reciprocal? "Comparing the scatterplots of $x$ vs $y$ and $y$ vs $x$, we see that they are related by reflecting in the line $y=x$. This is an isometry, and so preserves all lengths. Since regression is about minimising a sum of squares of lengths, the best line for regressing $y$ on $x$ will be transformed to the best line for regressing $x$ on $y$. Reflection transforms $y=mx$ to $x=my$, or $y=\frac{1}{m}x$. This appears to prove that $m$ and $m'$ are always reciprocal."

Hint: Are the line segments whose squares we are minimising horizontal or vertical?