Quantile regression extends the concept of linear regression, allowing us to explore the impact of variables not just on the mean, but across various quantiles of the response variable. It is especially useful when the residuals are not normally distributed, or when you want to model the impact of variables on different points (like the median or the 90th percentile) of your response variable.
In this comprehensive guide, we will delve deep into quantile regression, explaining its significance and offering a step-by-step guide on how to perform it using the R programming language.
Understanding Quantile Regression
Unlike linear regression that estimates the conditional mean of the response variable given certain values of predictor variables, quantile regression estimates the conditional median (or other quantiles) of the response variable.
Why Use Quantile Regression?
- Non-Constant Variance: When the variability of the dependent variable is unequal across the range of values of the independent variable.
- Outliers: When the data contains extreme values which might affect the mean, but less so the median.
- Interest in Impact Beyond the Mean: For exploring how the predictors impact not just the average, but other quantiles (e.g., what influences the top 10% of outcomes?).
Performing Quantile Regression in R
To conduct quantile regression in R, we use the quantreg
package and its rq()
function. Here, we will illustrate quantile regression through an example.
1. Setting Up Your Environment
First, you must install and load the quantreg
package:
install.packages("quantreg")
library(quantreg)
2. Sample Data
For this example, let’s use the mtcars
dataset, which comes built-in with R:
data(mtcars)
3. Fitting a Quantile Regression Model
To perform a quantile regression for the median (i.e., 0.5 quantile), you can use the following:
quantile_model <- rq(mpg ~ wt + hp, data = mtcars, tau = 0.5)
summary(quantile_model)
tau
is the quantile at which the model is fit. In this example, tau = 0.5
corresponds to the median.
4. Comparing Different Quantiles
You might be interested in how the effects change at different quantiles. For example, let’s compare the 10th, 50th, and 90th percentiles:
quantiles <- c(0.1, 0.5, 0.9)
models <- lapply(quantiles, function(tau) {
rq(mpg ~ wt + hp, data = mtcars, tau = tau)
})
# Print summaries
lapply(models, summary)
5. Visualizing the Results
You can visualize the results by plotting the quantile regression lines at different quantiles along with the data:
library(ggplot2)
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point() +
geom_quantile(quantiles = c(0.1, 0.5, 0.9),
formula = y ~ x,
color = "red") +
ggtitle("Quantile Regression of MPG on Weight")

Advantages and Limitations
Advantages:
- Provides a complete picture of the conditional distribution of the response variable.
- Is robust to outliers in the response variable.
- Does not make a restrictive assumption about the error terms (such as homoscedasticity in OLS).
Limitations:
- Interpretation can be less intuitive than mean-regression models.
- In some settings, it can be computationally intensive.
Real-World Applications of Quantile Regression
Quantile regression is incredibly versatile and has been employed in various fields, including:
- Economics: To study the differential effects of variables at various income levels.
- Environmental Science: To study the upper quantiles of pollutant concentration levels.
- Medicine: To model the time until an event of interest or endpoint (such as death) is reached.
Conclusion
Quantile regression is a valuable type of regression analysis that allows for more flexible assumptions and can provide a more complete picture of the relationship between variables. It is especially useful when the conditions of linear regression are not met, or when we are interested in the impact of variables on different points (quantiles) of the outcome variable.In R, the quantreg
package makes quantile regression analysis simple and accessible, providing an extensive suite of functions for fitting and diagnosing these models.