How to Calculate Manhattan Distance in Python

Spread the love

The Manhattan distance, also known as the Taxicab distance or the L1 norm, is a metric in which the distance between two points is calculated as the sum of the absolute differences of their Cartesian coordinates. It is a more accurate reflection of true distance when movement is restricted to a grid (as a taxi would be in Manhattan, hence the name).

In this article, we will go over how to calculate Manhattan distance in Python, starting with basic principles, then providing a Python function to achieve this, and eventually scaling it up for usage in machine learning applications.

Prerequisites

To understand this guide fully, you need some familiarity with Python and basic concepts in mathematics (particularly, geometry and vectors).

We’ll be using Python’s built-in functions, as well as functionalities from the numpy and scikit-learn libraries. If you don’t have these libraries installed, you can do so with pip:

pip install numpy
pip install scikit-learn

Defining Manhattan Distance

The Manhattan distance between two points in a 2D plane is the absolute difference in their X-coordinates plus the absolute difference in their Y-coordinates. If we have two points P1(x1, y1) and P2(x2, y2), the Manhattan distance between these points is given by:

|x1 – x2| + |y1 – y2|

This formula can be generalized to n-dimensional space as:

Σ |ai – bi|

where ai and bi are the ith components of points A and B respectively.

Calculating Manhattan Distance in Python

Manhattan Distance in a 2D Plane

Let’s start by creating a Python function that calculates the Manhattan distance between two points in a 2D plane.

def manhattan_distance_2D(point1, point2):
    return abs(point1[0] - point2[0]) + abs(point1[1] - point2[1])

point1 = [2, 3]
point2 = [5, 7]

print(manhattan_distance_2D(point1, point2))  # output: 7

In this function, point1 and point2 are lists representing the x and y coordinates of the two points. The function computes the absolute differences in the x and y coordinates and returns their sum.

Manhattan Distance in an n-Dimensional Space

The formula for Manhattan distance extends to more than just 2 dimensions. Here is a function that calculates the Manhattan distance between two points in an n-dimensional space:

def manhattan_distance_nd(point1, point2):
    return sum(abs(a - b) for a, b in zip(point1, point2))

point1 = [2, 3, 1]
point2 = [5, 7, 3]

print(manhattan_distance_nd(point1, point2))  # output: 9

Here, we’re using the built-in zip function to pair the corresponding elements of the two points. The sum function sums the absolute differences of these pairs.

Using Numpy for Efficient Calculations

In practice, you will likely be dealing with large amounts of data, and efficiency will become important. Numpy, a powerful library for numerical computation in Python, can make these calculations much more efficient.

Here’s how you can use numpy to calculate Manhattan distance:

import numpy as np

def manhattan_distance_nd_numpy(point1, point2):
    return np.sum(np.abs(np.array(point1) - np.array(point2)))

point1 = [2, 3, 1]
point2 = [5, 7, 3]

print(manhattan_distance_nd_numpy(point1, point2))  # output: 9

Numpy’s operations are vectorized, which means they operate on arrays (vectors) element-wise. This makes numpy’s computations significantly faster for large datasets compared to standard Python.

Applying Manhattan Distance in Machine Learning

The Manhattan distance is a useful tool in many areas, including machine learning, specifically in clustering and classification algorithms such as K-Nearest Neighbors (KNN) and K-Means.

Scikit-learn is a popular library for machine learning in Python and conveniently, it includes functionality to compute Manhattan distance, among other metrics.

Here’s how you can compute the Manhattan distance between two points using scikit-learn’s manhattan_distances function:

from sklearn.metrics.pairwise import manhattan_distances
import numpy as np

point1 = np.array([[2, 3, 1]])
point2 = np.array([[5, 7, 3]])

print(manhattan_distances(point1, point2))  # output: [[9.]]

The manhattan_distances function expects 2D arrays, so we need to provide our points as such.

It is important to note that manhattan_distances function can also calculate the pairwise distances between multiple points at once. For instance:

from sklearn.metrics.pairwise import manhattan_distances
import numpy as np

points = np.array([[2, 3, 1], [5, 7, 3], [2, 1, 3], [5, 4, 1]])

print(manhattan_distances(points)) 

This will output a pairwise Manhattan distance matrix for each pair of points in the array.

Conclusion

In this article, we have introduced the Manhattan distance, a fundamental concept in geometry and machine learning, and have shown how to calculate it in Python, both in basic Python and using the numpy and scikit-learn libraries.

Understanding and implementing such distance measures is essential in the field of data science and machine learning, as many algorithms rely heavily on distance computations. While we have focused on Manhattan distance here, there are many other distance measures (e.g., Euclidean, cosine) that may be more suitable depending on the problem at hand, so it’s beneficial to understand the differences and know how to implement each.

Leave a Reply