What is Linear Regression in Machine Learning ?
In general, Linear Regression is about fitting a straight line through the data and using that line to make predictions that minimizes the error between the observed value and the predicted value. The goal is to predict the value of an outcome variable (dependent variable) using one or more predictor variables ( or independent variables ). When we use a single predictor variable the method is called simple linear regression and when we use several predictor variables, the method is called multiple regression.
A linear regression line has an equation of the form Y = mX + b, where X is the independent variable and Y is the dependent variable. The slope of the line is m, and b is the intercept (the value of y when x = 0).
How to fit a straight line through the data ?
To fit a straight line through the data we start with some data and we fit a line to it using least squares. In other words we measure the residuals, the distance between the data and the line and then squares them so that negative values do not cancels out positive values and then we add them all up. Then we rotate the line a little bit and we do the same thing. We measure the residuals, square them and add them. And the line with the smallest sum of squared residuals ( the least square) is the line chosen to fit best.
How to train a Linear Regression model ?
Let’s read a dataset to work with.
import pandas as pd import numpy as np from sklearn import datasets housing = datasets.fetch_california_housing() X = pd.DataFrame(housing.data, columns=housing.feature_names) y = housing.target X.head()
Now, split the data into training and test set.
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Now, let’s train a linear regression model.
from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error # create a linear regression model lin_reg = LinearRegression() # fit a linear regression model lin_reg.fit(X_train, y_train) # make prediction on test set y_pred = lin_reg.predict(X_test) # measure error mse = mean_squared_error(y_test, y_pred) rmse = np.sqrt(mse) rmse
# output 0.7455813830127762
Related Posts –
- What are the assumptions of OLS Linear Regression?
- What happens when OLS Linear Regression Assumptions are Violated?