How to Perform Quadratic Regression in Python

Spread the love

Introduction

Quadratic regression, or polynomial regression of order 2, is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial. In quadratic regression, we aim to find the best fitting curve, or parabola, for a set of data points.

Quadratic regression extends the simple linear regression model, which models the relationship between x and y as a straight line, by adding an additional term, (x^2), to the equation of the line. This additional term allows the model to capture nonlinear relationships between x and y.

In this article, we’ll walk through a comprehensive guide on how to perform quadratic regression in Python using different libraries such as NumPy, SciPy, statsmodels, and scikit-learn.

Quadratic Regression in Python

Quadratic Regression with NumPy

We can perform quadratic regression in Python using NumPy’s polyfit function. This function fits a polynomial of a specified degree to a set of data using the method of least squares, and returns the coefficients of the polynomial.

Here is an example of how to use polyfit to perform quadratic regression:

import numpy as np
import matplotlib.pyplot as plt

# Define the data
x = np.array([1, 2, 3, 4, 5])
y = np.array([1, 4, 9, 16, 25])

# Perform quadratic regression
coefficients = np.polyfit(x, y, 2)
polynomial = np.poly1d(coefficients)

# Plot the original data and the polynomial fit
plt.scatter(x, y)
plt.plot(x, polynomial(x), color='red')
plt.show()

In this example, np.polyfit(x, y, 2) fits a second-degree polynomial (a parabola) to the data. The function np.poly1d(coefficients) creates a polynomial function from the coefficients returned by polyfit, which we can then use to compute the y-values for the polynomial fit.

Quadratic Regression with SciPy

We can also use the curve_fit function from the SciPy library to perform quadratic regression. This function fits a function of your choice to a set of data using the method of least squares.

Here is an example of how to use curve_fit to perform quadratic regression:

from scipy.optimize import curve_fit

# Define the form of the function we want to fit
def quadratic(x, a, b, c):
    return a * x**2 + b * x + c

# Perform quadratic regression
params, params_covariance = curve_fit(quadratic, x, y)

# Print the coefficients
print(params)

In this example, we define the function quadratic(x, a, b, c), which corresponds to the equation of a parabola. We then pass this function, along with our data, to curve_fit, which returns the coefficients that best fit our data.

Quadratic Regression with Scikit-Learn

Scikit-Learn provides the PolynomialFeatures class for transforming our input data, allowing us to fit a linear model to our transformed data to perform polynomial regression:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Reshape the data to fit the model
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)

# Transform the data
poly = PolynomialFeatures(degree=2)
x_poly = poly.fit_transform(x)

# Fit the model
model = LinearRegression()
model.fit(x_poly, y)

# Predict y values
y_pred = model.predict(x_poly)

# Plot the original data and the polynomial fit
plt.scatter(x, y)
plt.plot(x, y_pred, color='red')
plt.show()

In this example, PolynomialFeatures(degree=2) generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, if an input sample is two dimensional and of the form [a, b], the degree-2 polynomial features are [1, a, b, a^2, ab, b^2].

Quadratic Regression with Statsmodels

Statsmodels is a powerful Python library for statistics and econometrics. Statsmodels also allows us to perform quadratic regression by adding a quadratic term to our model:

import statsmodels.api as sm

# Create a DataFrame
df = pd.DataFrame({
    'x': x.flatten(),
    'y': y.flatten()
})

# Add a quadratic term to our model
df['x_squared'] = df['x'] ** 2

# Define our dependent variable
y = df['y']
# Define our independent variables
X = df[['x', 'x_squared']]
X = sm.add_constant(X)

# Fit the model
model = sm.OLS(y, X)
results = model.fit()

# Print the summary
print(results.summary())

In this example, we create a new column in our DataFrame, ‘x_squared’, which corresponds to the square of our independent variable, ‘x’. We then use this new column, along with ‘x’, as our independent variables in our model.

Conclusion

Quadratic regression is a useful statistical technique that allows us to model nonlinear relationships between variables. This article covered how to perform quadratic regression in Python using various popular libraries such as NumPy, SciPy, Scikit-Learn, and Statsmodels. You can choose the library that best fits your workflow, or the one you find most intuitive to use. Remember, the core concept of quadratic regression remains the same irrespective of the library used to implement it. Happy coding!

Leave a Reply