Rescale a Feature with MinMaxScaler in sklearn.

Spread the love

Min-Max Scaling (normalization) –

There are various machine learning algorithms which do not perform very well when the features have very different scales. There are various techniques to scaled the features to the same scale, one of them is Min-Max Scaling.

The Min-Max Scaling uses the minimum and maximum value of a feature to rescale values within a range typically between 0 to 1 or -1 to 1. Scikit-Learn has a MinMaxScaler which helps us do min-max scaling.

Formula for Min-Max Scaling –

where x is an original value, x' is the normalized value.

Let’s see how to do it.

# import libraries
import pandas as pd
from sklearn import datasets

# get features and target
housing = datasets.fetch_california_housing()
X =
y =

# create pandas dataframe
X = pd.DataFrame(X, columns=housing.feature_names)

Here we have some housing data. Let’s now apply Min-Max scaling.

from sklearn.preprocessing import MinMaxScaler

# apply min-max scaling
minmax_scaler = MinMaxScaler(feature_range=(0, 1))
scaled_feature = minmax_scaler.fit_transform(X)

output - 
array([[0.53966842, 0.78431373, 0.0435123 , 0.02046866, 0.00894083,
        0.00149943, 0.5674814 , 0.21115538],
       [0.53802706, 0.39215686, 0.03822395, 0.01892926, 0.0672104 ,
        0.00114074, 0.565356  , 0.21215139],
       [0.46602805, 1.        , 0.05275646, 0.02194011, 0.01381765,
        0.00169796, 0.5642933 , 0.21015936]])

By default MinMaxScaler scale the feature between 0 and 1 but if you need to change to some other value, you can do this with the feature_range haperparameter.

Let’s see how to apply Min-Max Scaling in a end to end machine learning problem.

# import libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# split the data into training and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# create a ml model using pipeline
model = make_pipeline(MinMaxScaler(), LinearRegression())
# fit the model on training data, y_train)
# test the model on test set
y_pred = model.predict(X_test)

# measure error
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
print("Root Mean Squre Error:", rmse)

output - 
Root Mean Squre Error: 0.7284008391515451

Rating: 1 out of 5.

Leave a Reply