How to Create Frequency Tables in Python?

Spread the love

Frequency tables are a basic tool in data analysis that allow you to view the number of occurrences of each value or category in a dataset. They are particularly useful when dealing with categorical data. Python, with its rich ecosystem of data analysis libraries, makes it easy to create and work with frequency tables.

In this article, we’ll explore how to create frequency tables in Python using different libraries: Pandas, NumPy, and SciPy.

Setting Up

To get started, you will need Python installed on your system. Python 3.6 or newer is recommended. If you don’t have Python installed, the Anaconda distribution is an easy way to get started because it includes Python, pandas, NumPy, and SciPy. If you already have Python, you can install these libraries using pip:

pip install pandas numpy scipy

Creating Frequency Tables with Pandas

Pandas is a powerful data manipulation library that is built on top of NumPy. Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

Let’s create a simple DataFrame and then a frequency table from it.

First, import the Pandas library:

import pandas as pd

Next, create a DataFrame:

# Create a DataFrame
data = {'Name': ['John', 'Anna', 'John', 'Anna', 'John', 'Anna', 'John', 'Anna', 'John'],
        'Age': [23, 25, 23, 25, 23, 24, 23, 24, 25]}
df = pd.DataFrame(data)

Now, to create a frequency table for the ‘Name’ column, you can use the value_counts() function:

# Create frequency table
freq_table = df['Name'].value_counts()
print(freq_table)

This will output the frequency of each name in the ‘Name’ column.

Creating Frequency Tables with NumPy

NumPy is a fundamental package for scientific computing with Python. If you just want to create a frequency table of a list of values, you can use the unique() function in NumPy.

First, import the NumPy library:

import numpy as np

Now, suppose you have the following list of values:

values = ['cat', 'dog', 'cat', 'cat', 'dog', 'rabbit', 'rabbit', 'rabbit']

You can create a frequency table as follows:

# Create frequency table
(values, counts) = np.unique(values, return_counts=True)
freq_table = np.asarray((values, counts)).T
print(freq_table)

Here, the unique() function returns a tuple containing the unique values and their counts. Then, we transpose it into a 2D array.

Creating Frequency Tables with SciPy

SciPy is a Python library used for scientific and technical computing. The itemfreq() function in the scipy.stats module can be used to create a frequency table.

First, import the necessary library:

from scipy import stats

Suppose you have the following list of values:

values = ['cat', 'dog', 'cat', 'cat', 'dog', 'rabbit', 'rabbit', 'rabbit']

You can create a frequency table as follows:

# Create frequency table
freq_table = stats.itemfreq(values)
print(freq_table)

Frequency Tables for Binned Data

In some cases, you may want to create a frequency table for binned data. For example, if you have a large range of ages, you might want to divide them into bins and then create a frequency table.

Pandas makes this easy with the cut() and value_counts() functions.

Here is an example:

# Create a DataFrame
data = {'Age': [20, 21, 23, 24, 25, 27, 29, 31, 32, 34, 35, 37, 38, 40]}
df = pd.DataFrame(data)

# Define bins
bins = [20, 25, 30, 35, 40]

# Create binned data
df['binned'] = pd.cut(df['Age'], bins)

# Create frequency table
freq_table = df['binned'].value_counts()
print(freq_table)

In this code, the cut() function is used to divide the ‘Age’ column into bins. Then, value_counts() is used to create the frequency table.

In conclusion, frequency tables are a fundamental tool in data analysis that allow you to quickly understand the distribution of your data. With Python and its rich ecosystem of data analysis libraries, creating and working with frequency tables is simple and straightforward.

Leave a Reply