How to Calculate the Gini Coefficient in Python

Spread the love

The Gini coefficient is a measure of inequality of a distribution. It’s a number between 0 and 1, where 0 corresponds to perfect equality (everyone has the same income) and 1 corresponds to perfect inequality (one person has all the income). The Gini coefficient is widely used in economics, particularly in the study of income distribution. In this article, we’ll walk through how to calculate the Gini coefficient in Python.

Understanding the Gini Coefficient

The Gini coefficient is derived from the Lorenz curve, which plots the cumulative income of a population from the lowest to the highest income, and compares it to a perfectly equal distribution of income.

The Gini coefficient measures the area between the Lorenz curve and the perfect equality line. A higher Gini coefficient indicates greater inequality, with 0 representing perfect equality and 1 representing perfect inequality.

Calculating the Gini Coefficient in Python

In Python, we can define a function to calculate the Gini coefficient. This function will take a list or array of values as input and return the Gini coefficient. Here’s an example of how you can do it:

import numpy as np

def gini_coefficient(x):
    # The rest of the values must be sorted:
    x = np.sort(x)
    # Index per data point
    index = np.arange(1, x.shape[0] + 1)
    # Number of data points
    n = x.shape[0]
    # Gini coefficient calculation
    return ((np.sum((2 * index - n  - 1) * x)) / (n * np.sum(x)))

# Example usage:
income_distribution = np.array([50000, 10000, 20000, 50000, 100000, 500000])
print(gini_coefficient(income_distribution))

In this function, x is the list or array of values. We sort these values in ascending order and calculate the Gini coefficient using the formula above.

Real-world Example: Calculating Gini Coefficient for Income Distribution

Suppose we have data on the income distribution in a particular country and we want to calculate the Gini coefficient to measure income inequality. The following example demonstrates this:

import pandas as pd
import numpy as np

# Suppose we have income data for a country
income_data = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'George', 'Hannah', 'Ivan', 'John'],
    'Income': [50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000]
})

# Calculate the Gini coefficient
income_distribution = np.array(income_data['Income'])
print(gini_coefficient(income_distribution))

Here, the income_data DataFrame contains the income of ten individuals. We convert the ‘Income’ column to a NumPy array and then feed it to our gini_coefficient function to calculate the Gini coefficient.

Conclusion

The Gini coefficient is a powerful tool for measuring the inequality of a distribution. Although it’s most commonly used to measure income inequality, it can be used to measure any form of inequality.

Knowing how to calculate the Gini coefficient in Python gives you a powerful tool for exploring and understanding inequality in your data. However, as with all statistical measures, it’s important to understand what the Gini coefficient can and can’t tell you and to interpret it within the context of your specific dataset and research questions.

Leave a Reply