How to Calculate Mean Absolute Error in Python

Spread the love

Introduction

In this article, we will calculate the Mean Absolute Error (MAE) in Python. MAE is a popular metric used to evaluate the performance of regression models.

Table of Contents

  1. Understanding Mean Absolute Error
  2. Install Required Libraries
  3. Calculating Mean Absolute Error Manually
  4. Using Libraries to Calculate Mean Absolute Error
  5. Practical Example with Real Datasets
  6. Interpretation of Mean Absolute Error
  7. Comparing MAE with Other Metrics
  8. Conclusion

1. Understanding Mean Absolute Error

Before delving into the technicalities, it is essential to understand what Mean Absolute Error is. In regression analysis, when we build a model, we want to measure how well the model is performing. There are various metrics to gauge performance, and MAE is one of them. It essentially tells us how big of an error we can expect from the forecast on average.

Mathematically, it is the average of the absolute differences between the actual and predicted values.

2. Install Required Libraries

let’s install the libraries we will be using.

pip install numpy pandas scikit-learn

3. Calculating Mean Absolute Error Manually

Let’s start by calculating the MAE manually in Python.

def mean_absolute_error(actual, predicted):
    return sum(abs(a - p) for a, p in zip(actual, predicted)) / len(actual)

# Example
actual = [3, -0.5, 2, 7]
predicted = [2.5, 0.0, 2, 8]
mae = mean_absolute_error(actual, predicted)
print(f"Mean Absolute Error: {mae}")

4. Using Libraries to Calculate Mean Absolute Error

4.1 Using scikit-learn

Scikit-learn is one of the most popular libraries for machine learning in Python. Let’s calculate MAE using scikit-learn.

from sklearn.metrics import mean_absolute_error

actual = [3, -0.5, 2, 7]
predicted = [2.5, 0.0, 2, 8]
mae = mean_absolute_error(actual, predicted)
print(f"Mean Absolute Error: {mae}")

4.2 Using Numpy

You can also use Numpy, which is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

import numpy as np

def mean_absolute_error(actual, predicted):
    return np.mean(np.abs(np.array(actual) - np.array(predicted)))

# Example
actual = [3, -0.5, 2, 7]
predicted = [2.5, 0.0, 2, 8]
mae = mean_absolute_error(actual, predicted)
print(f"Mean Absolute Error: {mae}")

5. Practical Example with Real Datasets

Let’s see how we can calculate MAE using a real dataset. We will use the Boston housing dataset, which is a famous dataset derived from information collected by the U.S. Census Service concerning housing in the area of Boston Mass.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.datasets import load_boston

# Load the dataset
data = load_boston()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.DataFrame(data.target, columns=["MEDV"])

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Calculate MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae}")

6. Interpretation of Mean Absolute Error

The Mean Absolute Error measures the average magnitude of the errors in a set of predictions, without considering their direction. It’s a measure of how wrong the predictions were, where 0 would mean that there were no errors.

Generally, the smaller the MAE, the better the model’s performance. However, interpreting MAE by itself can be tricky as it doesn’t have a scaling, so it’s good to compare it with the scale of your target variable.

7. Comparing MAE with Other Metrics

While MAE is a good metric, it’s often useful to compare it with other metrics like Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). These metrics can be more sensitive to outliers and can provide different insights into the errors the model is making.

8. Conclusion

In this article, we covered what Mean Absolute Error is and how to calculate it in Python both manually and using popular libraries like scikit-learn and Numpy. We went through a practical example using a real dataset and learned how to interpret MAE and compare it with other metrics. It is essential to remember that choosing the right metric depends on the problem you are trying to solve and understanding the data you are working with.

Leave a Reply