Sunday, March 05, 2017

multiple regression, class, lecture, 2017

1.theoretical model→hypothesis (relationship between variables)
2.use evidence (data) to test hypothesis → test theoretical model
3.use econometrics in testing hypotheses

  • regression analysis (result), no matter how statistically significant, can't prove causality. Regression analysis can only test whether a significant quantitative relationship exists, it can only test the strength and direction of the quantitative relationship involved
  • Y= a + b X; a (constant, intercept), b is coefficient; b is slope coefficient
  • Linear regresions need to be linear in the coefficients, do not necessarily need to be linear in the variables. Linear regresson analysis can be applied to an equation that is nonlinear in the variables if the equation can be formulated in a way that is linear in the coefficients. When econometricians use the phrase "linear regression", they usually mean "regressin that is linear in the coefficients"
  1. linear in variables -- an equation is linear in the variables if plotting the function in terms of X and Y generates a straight line --- Y= a + b X (is linear in the variables);  however,  Y = a + b Xis not linear in the variables
  2. linear in coefficients -- if linear regression techniques are to be applied to an equation, that equation must be linear in coefficients; An equation is linear in coefficients only if the coefficients (b) appear in the simplest form: that is, coefficients are not raised to any powers, (other than one), are not multiplied or divided by other coefficients, don't themselves include functions (e.g., log or exponents)--- Y= a + b X (is linear in the coefficients); however, Y= a + Xb (is not linear in the coefficients, a, b); Of all possible equation for a single explanatory variable (X), only functions of the general form f (Y)= a + b f (X) are linear in the coefficients a and b 
Stochastic Error Term ε
  • some variation in Y can't be explained by the model. This variation probably comes from sources such as omitted influences, measurement errors, incorrect functional form, random and unpredictable occurrences
  • unexplained variation (error) --- expressed through a stochastic (or random) error term
  • a stochastic error term is a term that is added to a regression equation to introduce all the variation in Y that can't be explained by the included Xs
  • error term is the difference between the observed Y and the true regression equation (the expected value of Y)
  • error term is a theoretical concept that can never be observed
  • 座標上某點A, 與 true regression line (can't be observed) 間之距離, 稱為 error term (can't be observed)
  • see below for picture
Y= a + b X1+ c X2 +d X3 + ε
  • b (regression coefficient): the impact of one unit increase in X1 on the dependable variable Y, holding constant the (influence of) other independent variables (X2, X3) -- isolate the impact on Y of a change in one variable from the impact on Y of changes in the other variables
  • if a variable is not included in an equation, then its impact is NOT held constant in the estimation of the regression coefficients
  • Time series --- data consists of a series of years or months ----                                                  Y= a + b X1t+ c X2t +d X3t + ε  (t = 1,2,3.....n), t is used to denote time
Residual (e)
  • Theoretical regression equation (purely abstract): Y = a + bX1+ ε, We can't actually observe the values of the true regression coefficients.      
  • Estimated regression equation: Y' = 105 + 12 X1, We calculate estimates of these coefficients from the observed data. 105,12 are estimated regression coefficients, they are obtained from sample data and are empirical best guess for the true regression coefficients (a,b)
  • the closer Y' is to Y, the better the fit of the equation
  • the difference between the estimated value of Y' and the actual value of Y is defined as "residual (e)" 
  • residual is a real-world value that is calculated for each observation every time a regression is run
  • the smaller the residuals, the better the fit, the closer the Y' will be to the Y
  • 座標上某點A, 與 estimated regression line (can be observed, estimated regression line 與true regression line 不同) 間之距離, 稱為 residual (can be observed and calculated/measured)
  • 就某個角度而言, residual can be thought of as an estimate of the error term (can't be observed)
  • difference between error term and residual (picture)






Ordinary Least Square
  • the purpose of regression analysis is to take a purely theoretical equation, Y= a + bX+ ε, and use a set of data to create an estimated equation, Y'= a'+b'X
  • OLS is a regression estimation technique that calculates a', b', so as to to minimize the sum of the squared residuals--- OLS minimizes Σ (Y-Y') 2
  • OLS is the simplest of all econometric estimation techniques
  • Most other techniques involves complicated nonlinear formulas or iterative procedures, many of which are extensions of OLS itself
Decomposition of variance -- decomposition of the variance in Y
picture

No comments: