## Saturday, April 01, 2017

### multiple regression, R square, adjusted R square

R^2

• R-squared is the “percent of variance explained” by the model.
• the percent of variance in the dependent variable explained collectively by all of the independent variables.
• R-squared was not of any use in guiding us through this particular analysis toward better and better models.
• be very careful when evaluating a model with a low value of R-squared.
• R-squared = Explained variation / Total variation
• R-squared is always between 0 and 100%:
• 0% indicates that the model explains none of the variability of the response data around its mean.
• 100% indicates that the model explains all the variability of the response data around its mean.
• R-squared does not indicate whether a regression model is adequate. You can have a low R-squared value for a good model, or a high R-squared value for a model that does not fit the data!
• There are two major reasons why it can be just fine to have low R-squared values. (1)In some fields, it is entirely expected that your R-squared values will be low. For example, any field that attempts to predict human behavior, such as psychology, typically has R-squared values lower than 50%. Humans are simply harder to predict than, say, physical processes. (2) Furthermore, if your R-squared value is low but you have statistically significant predictors, you can still draw important conclusions about how changes in the predictor values are associated with changes in the response value. Regardless of the R-squared, the significant coefficients still represent the mean change in the response for one unit of change in the predictor while holding other predictors in the model constant. Obviously, this type of information can be extremely valuable.
• A low R-squared is most problematic when you want to produce predictions that are reasonably precise (have a small enough prediction interval).
• You need to understand that R-square is a measure of explanatory power, not  fit. You can generate lots of data with low R-square, because we don't expect models (especially in social or behavioral sciences) to include all the relevant predictors to explain an outcome variable. You can cite works by Neter, Wasserman, or many other authors about R-square. You should note that R-square, even when small, can be significantly different from 0, indicating that your regression model has statistically significant explanatory power. However, you should always report the value of R-square as an effect size, because people might question the practical  significance of the value. As I said, in some fields, R-square is typically higher, because it is easier to specify complete, well-specified models. But in the social sciences, where it is hard to specify such modes, low R-square values are often expected. You can read about the difference between statistical significance and effect sizes if you want to know more.
• You need to understand that R-square is a measure of explanatory power, not  fit.
• You can generate lots of data with low R-square, because we don't expect models (especially in social or behavioral sciences) to include all the relevant predictors to explain an outcome variable. You can cite works by Neter, Wasserman, or many other authors about R-square. You should note that R-square, even when small, can be significantly different from 0, indicating that your regression model has statistically significant explanatory power.
• Low R-Squared value with statistically significant parameters is more valuable (useful) than high R-Squared accompanied with statistically insignificant parameters.
• If one's purpose is to build very efficient predictive models, then maximizing R^2 or adj. R^2 is key.  In the social sciences, where most often we're interested in  testing hypotheses about certain variables while adjusting for the effects of others, the significance levels of key variables are much more important