How to Use the predict function with glm in R

Spread the love

R, a programming language and software suite tailored for statistical computing, comes bundled with a suite of functions designed to make the process of modeling data easier and more intuitive. One such function, glm, stands for “Generalized Linear Models” and is used to fit a variety of different regression models. Once we’ve fit a model, we often want to make predictions on new data. This is where the predict function enters the scene. This article will provide an exhaustive look at how to use the predict function with glm in R.

The Basics of glm

Before diving into predictions, let’s review the basics of glm. The glm function fits generalized linear models, a class of models that includes, among others:

  • Linear regression
  • Logistic regression
  • Poisson regression

A typical usage of glm might look like this:

model <- glm(y ~ x1 + x2, data = my_data, family = gaussian())

Here, we are fitting a linear regression model where y is the dependent variable, and x1 and x2 are independent variables.

Making Predictions with predict

Once our model is trained, we might want to predict values of the dependent variable based on new values of the independent variables. This is done using the predict function.

Basic Usage

The basic usage is:

predicted_values <- predict(model, newdata = new_data)

Here, model is the model object returned by glm, and new_data is a data frame containing the new values of the independent variables.

Specifying Type of Prediction

The predict function allows you to specify the type of prediction you want:

  • type = "link": This is the default for glm. It gives the prediction on the scale of the linear predictors.
  • type = "response": This gives the prediction on the scale of the response variable. For a logistic regression model, this would return probabilities.


predicted_probs <- predict(model, newdata = new_data, type = "response")

Predicting with No New Data

If you don’t provide newdata, the predict function will use the data originally used to fit the model:

predicted_values_orig_data <- predict(model)

This can be useful for generating predicted values to calculate residuals or for model validation.

Dealing with Factor Variables

One challenge you may encounter when using the predict function with new data is when your model includes factor variables. If the new data contains levels not seen during model training, the predict function will throw an error.

To avoid this, ensure that the factor levels in your new data match those in your training data. You can do this by re-factoring the variable in the new data using the levels from the training data:

new_data$factor_variable <- factor(new_data$factor_variable, levels = levels(my_data$factor_variable))

Confidence Intervals and Predictions

You may want to generate confidence intervals around your predictions. For glm models, this is a bit more involved than for standard linear models. One approach is to use the predict function with the option:

preds_with_se <- predict(model, newdata = new_data, = TRUE)

This returns a list with two components:

  1. fit: The predicted values.
  2. The standard errors of the predicted values.

With these, you can construct approximate confidence intervals:

ci_upper <- preds_with_se$fit + (1.96 * preds_with_se$
ci_lower <- preds_with_se$fit - (1.96 * preds_with_se$


The predict function in R is a versatile tool that seamlessly integrates with models generated by glm. Understanding its nuances and capabilities can significantly streamline the process of generating and interpreting predictions from your models. Whether you’re dealing with linear, logistic, or any other type of generalized linear model, the predict function stands as a cornerstone in the analysis and application of statistical models in R. As with all modeling endeavors, always ensure to validate the accuracy and appropriateness of your predictions using external data or robust cross-validation techniques.

Posted in RTagged

Leave a Reply