What is StandardScaler in sklearn?
The StandardScaler is a method of standardizing data such the the transformed feature has 0 mean and and a standard deviation of 1. The transformed features tells us how many standard deviation the original feature is away from the feature’s mean value also called a z-score in statistics.
How to use StandardScaler in sklearn?
Let’s read a dataset to work with.
# import libraries import pandas as pd from sklearn import datasets # get features and target housing = datasets.fetch_california_housing() X = housing.data y = housing.target # create pandas dataframe X = pd.DataFrame(X, columns=housing.feature_names) X.head()
Now, to standardize the data we us the standardScaler in scikit-learn.
from sklearn.preprocessing import StandardScaler # create scaler scaler = StandardScaler() # transformed the feature standardized = scaler.fit_transform(X) print("Standardized Features:\n", standardized[:3]) output - Standardized Features: [[ 2.34476576 0.98214266 0.62855945 -0.15375759 -0.9744286 -0.04959654 1.05254828 -1.32783522] [ 2.33223796 -0.60701891 0.32704136 -0.26333577 0.86143887 -0.09251223 1.04318455 -1.32284391] [ 1.7826994 1.85618152 1.15562047 -0.04901636 -0.82077735 -0.02584253 1.03850269 -1.33282653]]
Now, if we look at the mean and standard deviation of this data, you will find that the mean is 0 and standard deviation is 1.
print("Mean:", round(standardized.mean())) print("Standard Deviation:", round(standardized.std())) output - Mean: 0 Standard Deviation: 1