OLS Coefficient Estimates


Question: The Ordinary Least Squares (Univariate OLS) regression minimizes the sum of the squared vertical distances from al the observations in the sample to the regression line. Find the  and  that minimize the following equation – and justify each step:

       



What are Our Assumptions?  Simple Regression (SR)  Assumptions

  1. SR1: our data is derived from a random sample from the population.
  2. SR2:  are measured without error.
  3. SR3: The true relationship between  is linear, that is 
  1.  picks up all the other stuff that describes  other than 
  2.  is the dependent variable,  is the independent variable,  is the error or disturbance term (unobserved),  is the  intercept, is the slope.
  1. SR4:  varies in the sample use have drawn. (obviously, you need some variation in . You independent observations can't all be identical).
  2. SR5:   .   Zero Conditional Mean Assumption. That is, the average effect of the error term does not vary with the independent variable. This assumption usually fails, often the most damaging when people try to do empirical studies. It assume that the unobserved determinants of the outcome variable don't vary systematically with .


Set Up: We want to minimize this – the mean squared distance between the true dependent variable and our (as-yet) hypothetical regression line.


Note that assumption SR3 states that , so plugging this in




Solving for  – The derivative with respect to below are the first order conditions w.r.t.  (taking the derivative of our minimization problem above, with respect to  – it's  just calculus)


Going forward, our goal is to get the above equation so that  is equal to something simple and easy to deal with. Note that all the little tricks and oddities to come help us achieve that goal.


Note the "equals zero" on the Right Hand Side, so the 2 and the negative one can get divided out without issue.


And we can multiply it all by 1/n without a problem, since it's just zero on the other side.


Also, let's break the summation into three separate terms.




Note that  the mean of the dependent variable.

Note that , which is simple.




Note that 




Now, solving for 


We know how to find the mean of y and the mean of x, but we'll need some more work to find beta-hat-one.



Solving for  – The derivative with respect to   –  Now, let's find what our estimate of beta-one is.   Derivative with respect to :



As a reminder, going forward, our goal is to get the above equation so that  in terms of things we know. 

Plus we'll do some work to get the equation equal to something simple and easy to deal with. 

All the little tricks and oddities to come help us achieve that goal.



We can get rid of the 2 and negative one since it's all equal to zero.

Note that we know that  so we can plug that in too.




Let's open up one of the parentheses in the summation, distributing the negative sign.




Let's break the summation into two different terms.




Note that the  scalar in the second term can be brought to the front of the summation.




To the term above, we'll subtract,  "" and subtract ""  

This term is equal to zero, so it's chill.


On  








            

 This shows that all we're doing is adding zero.


The same works for 




Let's put those two "zero" terms in:




Now let's distribute them




Recall our goal is to solve this equation in terms of 

Moving the second term above to the other side of the equation, we get:




Now, distribute that negative sign in front of into the first summation.




Notice that "" equals ""




Now solve in terms of 




Note that 


And that 





To Recap, our goal was to solve that minimization equation in terms of  in terms of simple statistics that we are quite familiar with.


Beta-hat-sub-nought,  –  we've solved for the x-intercept of our linear regression in terms of beta-hat-sub-one and means of the independent and dependent variables.


Beta-hat-sub-one,  – and we've solved the slope of our regression line in terms of the covariance and variance of x.




And you can use your answer here to find a number for 

So, if you want to estimate a line that best a sample of  variables, you now have two relatively simple equations to do so.








































































































Comments