
There are various datasets available in scikit-learn to get started quickly with Machine Learning. In this post, you will learn how to load an example dataset in scikit-learn.
1 . Scikit-Learn dataset for regression –
Let’s first read a dataset for regression then we look at how to read a dataset for classification.
We will start by reading the California housing dataset.
To access the California housing dataset from the scikit learn dataset module
from sklearn import datasets
housing = datasets.fetch_california_housing()
To know all the available dataset type
datasets.fetch_*?
To view information about a dataset
from pprint import pprint
pprint(housing.DESCR)
Viewing the California housing dataset –
To get the data type
housing.data

housing.data.shape
output - (20640, 8)
This means we have 20640 observations in the dataset and for each observations we have 8 features.
To get the features name or column names type
housing.feature_names

To get the target
housing.target
output - array([4.526, 3.585, 3.521, ..., 0.923, 0.847, 0.894])
Target shape
housing.target.shape
output - (20640,)
For each observations, we should have one target that is why it is 20640.
Convert to Pandas DataFrame –
To convert this dataset to pandas dataframe type
import pandas as pd
features = pd.DataFrame(housing.data, columns=housing.feature_names)
features.head()

target = pd.Series(housing.target)
target

2 . Scikit-Learn dataset for classification –
Loading classification dataset is very similar to the regression dataset, so I will show you quickly.
Let’s load the famous iris dataset.
from sklearn import datasets
iris = datasets.load_iris()
To get the data type
iris.data

To get feature names type
iris.feature_names

The shape (rows, columns ) of the data
iris.data.shape
output - (150, 4)
There are 150 iris flowers data and for each flowers we have 4 features.
To get the target type
iris.target

And to get the target names type –
iris.target_names
output - array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
So 0 refers to the setosa species, 1 refers to versicolor and 2 refers to virginica.