# How to Calculate Median Absolute Deviation in Python

### Introduction

As data scientists and statisticians, measuring the variability and dispersion within a dataset is a common and crucial task. While the standard deviation and variance are popular measures, they are sensitive to outliers. The Median Absolute Deviation (MAD), on the other hand, is a more robust metric for dispersion. In this comprehensive guide, we will explore the concept of MAD, and step-by-step, learn how to calculate it using Python.

### Understanding Median Absolute Deviation

MAD is a measure of statistical dispersion, representing the median of the absolute deviations from the median of a dataset. In simpler terms, it measures how spread out the values in a dataset are from the median. The formula for MAD is:

Where:

• Xi represents each value in the dataset
• median(X) is the median of the dataset

### Data Preparation

You need a dataset to work with for computing MAD. You can use real-world data or create synthetic data. In this example, we will create synthetic data using pandas:

import pandas as pd

# Create a DataFrame with sample data
data = {'Values': [3, 4, 5, 5, 2, 3, 4.5, 5.2, 7, 2.8, 4.9]}
df = pd.DataFrame(data)

Now, let’s create a function to calculate MAD using the formula mentioned.

import numpy as np

"""
Calculate the Median Absolute Deviation (MAD)

:param data: list of values
"""
# Calculate the median of the data
median = np.median(data)

# Calculate the absolute deviations from the median
absolute_deviations = [np.abs(x - median) for x in data]

return mad

Using the function.

values = df['Values'].tolist()

print(f'MAD: {mad}')

### Leveraging Scikit-learn

Although scikit-learn doesn’t have a built-in function for MAD, we can leverage the robust_scale function to calculate MAD. The robust_scale function scales the dataset using parameters that are robust to outliers, which involves using the median and MAD.

from sklearn.preprocessing import robust_scale

# Note: The robust_scale function returns standardized values, so we need to extract MAD
print(f'MAD (using scikit-learn): {mad}')

### Using Pandas

Pandas provides a built-in method for calculating MAD, which is extremely convenient for datasets stored as DataFrame.

mad = df['Values'].mad()
print(f'MAD (using pandas): {mad}')

Note: Pandas uses the mean instead of the median in its calculation. To get the true MAD, we can still use the function we created earlier.

### Conclusion

Through this extensive guide, we delved into the concept of Median Absolute Deviation (MAD), its importance as a robust measure of dispersion, and the various methods to calculate it in Python. With a custom function, leveraging scikit-learn, and using Pandas, we have a toolbox of methods for incorporating MAD into data analysis workflows. This understanding and utilization of MAD are vital for analyzing datasets, especially those with outliers that can affect dispersion metrics.