What is StandardScaler in Sklearn and How to use It

What is StandardScaler in sklearn?

The StandardScaler is a method of standardizing data such the the transformed feature has 0 mean and and a standard deviation of 1. The transformed features tells us how many standard deviation the original feature is away from the feature’s mean value also called a z-score in statistics.

How to use StandardScaler in sklearn?

Let’s read a dataset to work with.

# import libraries
import pandas as pd
from sklearn import datasets

# get features and target
housing = datasets.fetch_california_housing()
X = housing.data
y = housing.target

# create pandas dataframe
X = pd.DataFrame(X, columns=housing.feature_names)
X.head()

Now, to standardize the data we us the standardScaler in scikit-learn.

from sklearn.preprocessing import StandardScaler
# create scaler
scaler = StandardScaler()
# transformed the feature
standardized = scaler.fit_transform(X)
print("Standardized Features:\n", standardized[:3])

output -
Standardized Features:
[[ 2.34476576  0.98214266  0.62855945 -0.15375759 -0.9744286  -0.04959654
1.05254828 -1.32783522]
[ 2.33223796 -0.60701891  0.32704136 -0.26333577  0.86143887 -0.09251223
1.04318455 -1.32284391]
[ 1.7826994   1.85618152  1.15562047 -0.04901636 -0.82077735 -0.02584253
1.03850269 -1.33282653]]

Now, if we look at the mean and standard deviation of this data, you will find that the mean is 0 and standard deviation is 1.

print("Mean:", round(standardized.mean()))
print("Standard Deviation:", round(standardized.std()))

output -
Mean: 0
Standard Deviation: 1

Related Posts –

Rating: 1 out of 5.