
Polynomial regression is a type of regression analysis in which the relationship between the independent variable and the dependent variable is modeled as an nth-degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Here, we’ll walk through how to perform polynomial regression in Python.
Step 1: Import Necessary Libraries
First, you’ll need to import the libraries necessary for polynomial regression. Here’s what you’ll need:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
Step 2: Generate or Import Your Data
You’ll need data to perform polynomial regression on. For the purposes of this article, let’s generate some data.
# Set a random seed for reproducibility
np.random.seed(0)
# Create a function to generate a polynomial dataset
def create_polynomial_data(n):
X = np.linspace(-3, 3, n)
y = X ** 3 + 2 * X**2 - 3 * X + 2
y += np.random.normal(0, 1, n) # add some noise
return X, y
X, y = create_polynomial_data(100)
# Plot the data
plt.scatter(X, y)
plt.show()

Step 3: Reshape Your Data
The LinearRegression
class in the sklearn library expects a 2D array-like object as the feature matrix. Therefore, we need to reshape our X array as our data is not a 2D array.
# Reshape your data
X = X[:, np.newaxis]
Step 4: Create Polynomial Features
Next, we need to transform our matrix of predicted variables into a new matrix containing additional columns for the nth degree of each feature.
# Create polynomial features
polynomial_features = PolynomialFeatures(degree=3)
X_poly = polynomial_features.fit_transform(X)
In the above code, degree=3
is used for our polynomial regression example. This can be adjusted depending on the relationship you observe in the data.
Step 5: Fit the Model
We can now fit our model using the LinearRegression
class from sklearn.
# Fit the model
model = LinearRegression()
model.fit(X_poly, y)
Step 6: Make Predictions
Now that we’ve fitted our model, we can use it to make predictions.
# Make predictions
y_poly_pred = model.predict(X_poly)
Step 7: Evaluate the Model
Finally, we should evaluate our model using metrics such as Mean Squared Error (MSE) and the Coefficient of Determination (R² score).
# Calculate root mean squared error
rmse = np.sqrt(mean_squared_error(y, y_poly_pred))
# Calculate R^2 Score
r2 = r2_score(y, y_poly_pred)
print(f"Root Mean Squared Error: {rmse}")
print(f"R^2 Score: {r2}")
Step 8: Visualize the Results
We can also visualize our polynomial regression model with a plot.
import operator
# Sort the values of x before line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(X, y_poly_pred), key=sort_axis)
X, y_poly_pred = zip(*sorted_zip)
plt.scatter(X, y, s=10)
plt.plot(X, y_poly_pred, color='r')
plt.show()

That’s it! This guide should help you perform polynomial regression in Python. Remember, though, polynomial regression may not be suitable for all datasets, and may result in overfitting if the degree of the polynomial is too high. Always evaluate your models and check your assumptions. Also, consider using other regression techniques or data transformations as appropriate.