Question: The Ordinary Least Squares (Univariate OLS) regression minimizes the sum of the squared vertical distances from al the observations in the sample to the regression line. Find the and that minimize the following equation – and justify each step:What are Our Assumptions? Simple Regression (SR) Assumptions - SR1: our data is derived from a random sample from the population.
- SR2: are measured without error.
- SR3: The true relationship between is linear, that is
- picks up all the other stuff that describes other than
- is the dependent variable, is the independent variable, is the error or disturbance term (unobserved), is the intercept, is the slope.
- SR4: varies in the sample use have drawn. (obviously, you need some variation in . You independent observations can't all be identical).
- SR5: . Zero Conditional Mean Assumption. That is, the average effect of the error term does not vary with the independent variable. This assumption usually fails, often the most damaging when people try to do empirical studies. It assume that the unobserved determinants of the outcome variable don't vary systematically with .
Set Up: We want to minimize this – the mean squared distance between the true dependent variable and our (as-yet) hypothetical regression line. Note that assumption SR3 states that , so plugging this in Solving for – The derivative with respect to : below are the first order conditions w.r.t. (taking the derivative of our minimization problem above, with respect to – it's just calculus) Going forward, our goal is to get the above equation so that is equal to something simple and easy to deal with. Note that all the little tricks and oddities to come help us achieve that goal. Note the "equals zero" on the Right Hand Side, so the 2 and the negative one can get divided out without issue. And we can multiply it all by 1/n without a problem, since it's just zero on the other side. Also, let's break the summation into three separate terms. Note that the mean of the dependent variable. Note that , which is simple. Note that Now, solving for We know how to find the mean of y and the mean of x, but we'll need some more work to find beta-hat-one. Solving for – The derivative with respect to – Now, let's find what our estimate of beta-one is. Derivative with respect to : As a reminder, going forward, our goal is to get the above equation so that in terms of things we know. Plus we'll do some work to get the equation equal to something simple and easy to deal with. All the little tricks and oddities to come help us achieve that goal. We can get rid of the 2 and negative one since it's all equal to zero. Note that we know that so we can plug that in too. Let's open up one of the parentheses in the summation, distributing the negative sign. Let's break the summation into two different terms. Note that the scalar in the second term can be brought to the front of the summation. To the term above, we'll subtract, "" and subtract "" This term is equal to zero, so it's chill. On
This shows that all we're doing is adding zero. The same works for Let's put those two "zero" terms in: Now let's distribute them Recall our goal is to solve this equation in terms of . Moving the second term above to the other side of the equation, we get: Now, distribute that negative sign in front of into the first summation. Notice that "" equals "" Now solve in terms of Note that And that To Recap, our goal was to solve that minimization equation in terms of in terms of simple statistics that we are quite familiar with. Beta-hat-sub-nought, – we've solved for the x-intercept of our linear regression in terms of beta-hat-sub-one and means of the independent and dependent variables. Beta-hat-sub-one, – and we've solved the slope of our regression line in terms of the covariance and variance of x. And you can use your answer here to find a number for So, if you want to estimate a line that best a sample of variables, you now have two relatively simple equations to do so. |