Tuesday, March 14, 2017

logit vs probit; stata output; log odds vs odds ratio

for Binary Outcome:
• Success(1)/Failure(0)
• Heart Attack(1)/No Heart Attack(0)
• In(1)/Out of the Labor Force(0)

 y* as the underlying latent propensity that y=1 (yes, success, heart attack, in the labor force)
• Example: For the binary variable, (yes/no; heart attack/no heart attack), y* is the propensity for yes (1); a heart attack (1).
• Example 2: For the binary variable (in/out of the labor force), y* is the propensity to be in the labor force (1).

Since y* is unobserved, we use do not know the distribution of the errors, ε
• however, in order to use maximum likelihood estimation (ML), we need to make some assumption about the distribution of the errors.

Thus, the difference between Logistic and Probit models lies in this assumption about the distribution of the errors.

which one to choose? binomial logit or binomial probit?
  • Results tend to be very similar 
  • Preference for one over the other tends to vary by discipline
  • binomial logit is the most frequently used estimation technique for equations with dummy dependent variables
  • binomial probit
  • logistic regression is robust against multivariate normality and therefore better suited for smaller samples than a probit model
  • probit models typically are estimated by applying maximum likelihood techniques
  • probit is based on the cumulative normal distribution
  • probit estimation procedure uses more computer time than does logit
  • since probit is based on the normal distribution, it is quite theoretically appealing (because many economic variables are normally distributed)--however, with extremely large samples, this advantages falls away
Stata output
How to interpret logit?

Sample: BA degree earners
• Dependent Variable: Entry into a STEM occupation  (yes=1, no=0)
• Independent Variable: Parent education-- categorical variable of highest degree: 2-year
degree or lower vs. BA and Advanced Degree

Log odds (b coefficient in stata)
  • When used in logistic regression the log odds tells us how much the odds an outcome occuring increase (or decrease) when there is a unit change in the associated explanatory variable.
  • In logistic regression the b coefficient indicates the increase in the log odds of the outcome for a one unit increase in X. 
  • Interpretation: Among BA earners, having a parent whose highest degree is a BA degree (code 1) versus a 2-yr degree or less (code 0) increases the log odds of entering a STEM job by 0.477.
  • Interpretation: Among BA earners, having a parent whose highest degree is a BA degree (code 1) versus a 2-year degree or less (code 0) increases the log odds by 0.477.

Odds ratio 
  • However,log odds do not provide an intuitively meaningful scale to interpret the change in the outcome variable. Taking the exponent of the log odds allows interpretation of the coefficients in terms of Odds Ratios (OR) which are substantive to interpret
  • we can easily transform log odds into odds ratios by exponentiating the coefficients (b coeffcient= 0.477)--- exp(0.477)=1.61 
  • Interpretation: BA degree earners with a parent whose highest degree is a BA degree (code 1) are 1.61 times more likely to enter into a STEM occupation than those with a parent who have a 2-year degree or less (code 0).

SPSS gives this Odds ration for the explanatory variable labelled as Exp(B).
The odds ratio is labelled as Exp(B) on SPSS

No comments: