Calculating Symmetric Mean Absolute Percentage Error (SMAPE) in Python

Spread the love

Introduction

The evaluation of predictive models in data analysis is a vital step in understanding their accuracy and overall performance. One commonly used metric in time-series forecasting is the Symmetric Mean Absolute Percentage Error (SMAPE). This article will provide a comprehensive discussion on SMAPE, its relevance, and a detailed guide on calculating it using Python.

Understanding SMAPE

Before we explore how to calculate SMAPE using Python, it’s important to first understand what SMAPE is and how it works. SMAPE is a measure of accuracy based on percentage or relative errors. Unlike MAPE (Mean Absolute Percentage Error), which could lead to infinite or undefined behaviors if the actual value is zero, SMAPE has a symmetric characteristic that treats under-forecast and over-forecast errors equally.

SMAPE is defined by the following formula:

SMAPE = (100/n) * Σ (2 * |Yt – Ft| / (|Yt| + |Ft|))

Where:

  • n is the number of data points
  • Σ represents the sum over all data points
  • Yt represents the actual value
  • Ft represents the forecasted or predicted value

Data Preparation

We will need a dataset containing actual and predicted values to compute SMAPE. You can use a pre-existing dataset or generate synthetic data. For this guide, we will create a synthetic dataset using pandas.

import pandas as pd

# Create a DataFrame with actual and predicted values
data = {'Actual': [100, 200, 300, 400, 500], 'Predicted': [110, 190, 320, 390, 490]}
df = pd.DataFrame(data)

Implementing SMAPE Calculation

We can now define a Python function to calculate SMAPE.

def calculate_smape(actual, predicted):
    """
    Calculate the Symmetric Mean Absolute Percentage Error (SMAPE)
    
    :param actual: list of actual values
    :param predicted: list of predicted values
    :return: SMAPE as a percentage
    """
    # Ensure actual and predicted lists have the same length
    if len(actual) != len(predicted):
        raise ValueError("Input lists must have the same length")
    
    # Calculate SMAPE
    n = len(actual)
    total_error = 0
    for a, p in zip(actual, predicted):
        total_error += 2 * abs(a - p) / (abs(a) + abs(p))
    
    smape = (total_error / n) * 100
    return smape

To use the function, simply pass the lists of actual and predicted values.

actual = df['Actual'].tolist()
predicted = df['Predicted'].tolist()

smape = calculate_smape(actual, predicted)
print(f'SMAPE: {smape}%')

Optimizing with NumPy

Python’s NumPy library allows for more efficient computations with its array operations. Let’s refactor the SMAPE function to use NumPy.

import numpy as np

def calculate_smape_numpy(actual, predicted):
    """
    Calculate the Symmetric Mean Absolute Percentage Error (SMAPE) using numpy
    
    :param actual: numpy array of actual values
    :param predicted: numpy array of predicted values
    :return: SMAPE as a percentage
    """
    # Ensure actual and predicted arrays have the same shape
    if actual.shape != predicted.shape:
        raise ValueError("Input arrays must have the same shape")
    
    # Calculate SMAPE using numpy
    smape = np.mean(2 * np.abs(actual - predicted) / (np.abs(actual) + np.abs(predicted))) * 100
    return smape

And use this function like this.

actual_np = np.array(actual)
predicted_np = np.array(predicted)

smape = calculate_smape_numpy(actual_np, predicted_np)
print(f'SMAPE (using numpy): {smape}%')

Conclusion

In this exhaustive guide, we discussed the Symmetric Mean Absolute Percentage Error (SMAPE), its application in evaluating predictive models, and how to calculate it using Python. Through SMAPE, we can accurately assess the performance of time-series models, especially when comparing different models. We also optimized the implementation using NumPy, allowing for more efficient calculations. SMAPE’s symmetric property ensures a balanced error assessment, making it a vital tool in forecasting analysis.

Leave a Reply