Quadratic regression is a type of polynomial regression that models the relationship between a dependent and an independent variable as an nth degree polynomial. In the case of quadratic regression, it’s a 2nd-degree polynomial. It’s often used when the relationship between variables isn’t strictly linear. In this guide, we’ll explore how to execute and interpret quadratic regression in R.
1. Understanding Quadratic Regression
Quadratic regression is given by the equation:
- Y is the dependent variable.
- X is the independent variable.
- β0,β1,β2 are coefficients.
- ϵ represents the error term.
The X2 term introduces the polynomial component, allowing the model to capture U-shaped patterns.
2. Setting Up the R Environment
Ensure you have R and optionally RStudio installed.
Step 1: Install Required Packages
Step 2: Load Necessary Libraries
3. Performing Quadratic Regression
Using R’s built-in
mtcars dataset, we’ll try to model a quadratic relationship between
mpg (miles per gallon) and
Step 1: Visualizing the Data
A scatter plot can help in visualizing the relationship:
ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + ggtitle("Scatter plot of mpg vs. hp")
Step 2: Fitting the Quadratic Model
To incorporate the quadratic term, we’ll add an
I(hp^2) term to our formula in
quad_model <- lm(mpg ~ hp + I(hp^2), data = mtcars) summary(quad_model)
4. Interpreting the Results
- Coefficients: These represent the change in the dependent variable for a unit change in the independent variable. The coefficient for hp2 will indicate the curvature.
- R-squared: Indicates how well the model fits the data. A value closer to 1 denotes a better fit.
- p-value: A lower p-value (typically ≤ 0.05) for a coefficient suggests it’s significant.
5. Checking Assumptions
Quadratic regression, like linear regression, requires certain assumptions to be met:
1. Linearity in Parameters: Even though the relationship between the variables is quadratic, the coefficients β must have a linear relationship with the dependent variable.
2. Independence: Observations should be independent of each other.
3. Homoscedasticity: Residuals should have constant variance. Plotting residuals can help check this:
plot(quad_model$residuals, ylab="Residuals", main="Residual Plot") abline(h=0, col="red")
4. Normality of Residuals: The residuals should be normally distributed. A Q-Q plot can check this:
6. Visualizing the Quadratic Fit
A graph can show how well the quadratic model fits the data:
ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + geom_smooth(method="lm", formula=y ~ poly(x, 2), se=FALSE, color="red") + ggtitle("Quadratic Fit of mpg vs. hp")
7. Model Validation and Prediction
predict() function to get fitted values:
mtcars$predicted_mpg <- predict(quad_model, mtcars)
For new data points:
new_data <- data.frame(hp=c(100, 150)) predicted_values <- predict(quad_model, new_data) print(predicted_values)
8. Advantages and Disadvantages
- Can capture non-linear relationships.
- Doesn’t require transformation of variables.
- Can easily overfit with higher-degree polynomials.
- Requires careful validation.
Quadratic regression is a powerful tool when faced with U-shaped patterns in data. It captures non-linear relationships without the need for data transformations. However, like all models, the assumptions need to be checked carefully. Quadratic regression forms a stepping stone towards more complex polynomial regression models, and R provides a comprehensive suite of tools to work with these models efficiently.