Introduction to Logistic Regression

We use logistic regression when we want to model a dependent variable that is binary. (Ordinary least squares regression requires a numeric dependent variable, as you will recall.)

Let's consider an example where we are trying to account for why some US counties experience persistent poverty. Note that the regression function is glm() rather than lm(). With glm() -- the generalized linear model -- we need to specify which distribution to use for the test; since we are testing a binary dependent variable, we specify family="binomial". The regression coefficients are expressed in logits -- that is, log of odds ratios. It is useful to convert the coefficients back to odds, which we do with exp(coef(glm(...))).

The goodness of fit is expressed by deviance (or, -2LL). We use ANOVA to test for the significance of the model; we compare the empty model, using only the intercept and no predictors, as the comparison; then we compare the full model to see if it accounts for more of the variation in the dependent variable.

When it makes sense to do so, we can test a partial model (instead of the empty model) and the full model. In this case, we are testing for the increase in explanatory power when we add the region factor.