
Frequency tables are a basic tool in data analysis that allow you to view the number of occurrences of each value or category in a dataset. They are particularly useful when dealing with categorical data. Python, with its rich ecosystem of data analysis libraries, makes it easy to create and work with frequency tables.
In this article, we’ll explore how to create frequency tables in Python using different libraries: Pandas, NumPy, and SciPy.
Setting Up
To get started, you will need Python installed on your system. Python 3.6 or newer is recommended. If you don’t have Python installed, the Anaconda distribution is an easy way to get started because it includes Python, pandas, NumPy, and SciPy. If you already have Python, you can install these libraries using pip:
pip install pandas numpy scipy
Creating Frequency Tables with Pandas
Pandas is a powerful data manipulation library that is built on top of NumPy. Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
Let’s create a simple DataFrame and then a frequency table from it.
First, import the Pandas library:
import pandas as pd
Next, create a DataFrame:
# Create a DataFrame
data = {'Name': ['John', 'Anna', 'John', 'Anna', 'John', 'Anna', 'John', 'Anna', 'John'],
'Age': [23, 25, 23, 25, 23, 24, 23, 24, 25]}
df = pd.DataFrame(data)
Now, to create a frequency table for the ‘Name’ column, you can use the value_counts()
function:
# Create frequency table
freq_table = df['Name'].value_counts()
print(freq_table)
This will output the frequency of each name in the ‘Name’ column.
Creating Frequency Tables with NumPy
NumPy is a fundamental package for scientific computing with Python. If you just want to create a frequency table of a list of values, you can use the unique()
function in NumPy.
First, import the NumPy library:
import numpy as np
Now, suppose you have the following list of values:
values = ['cat', 'dog', 'cat', 'cat', 'dog', 'rabbit', 'rabbit', 'rabbit']
You can create a frequency table as follows:
# Create frequency table
(values, counts) = np.unique(values, return_counts=True)
freq_table = np.asarray((values, counts)).T
print(freq_table)
Here, the unique()
function returns a tuple containing the unique values and their counts. Then, we transpose it into a 2D array.
Creating Frequency Tables with SciPy
SciPy is a Python library used for scientific and technical computing. The itemfreq()
function in the scipy.stats
module can be used to create a frequency table.
First, import the necessary library:
from scipy import stats
Suppose you have the following list of values:
values = ['cat', 'dog', 'cat', 'cat', 'dog', 'rabbit', 'rabbit', 'rabbit']
You can create a frequency table as follows:
# Create frequency table
freq_table = stats.itemfreq(values)
print(freq_table)
Frequency Tables for Binned Data
In some cases, you may want to create a frequency table for binned data. For example, if you have a large range of ages, you might want to divide them into bins and then create a frequency table.
Pandas makes this easy with the cut()
and value_counts()
functions.
Here is an example:
# Create a DataFrame
data = {'Age': [20, 21, 23, 24, 25, 27, 29, 31, 32, 34, 35, 37, 38, 40]}
df = pd.DataFrame(data)
# Define bins
bins = [20, 25, 30, 35, 40]
# Create binned data
df['binned'] = pd.cut(df['Age'], bins)
# Create frequency table
freq_table = df['binned'].value_counts()
print(freq_table)
In this code, the cut()
function is used to divide the ‘Age’ column into bins. Then, value_counts()
is used to create the frequency table.
In conclusion, frequency tables are a fundamental tool in data analysis that allow you to quickly understand the distribution of your data. With Python and its rich ecosystem of data analysis libraries, creating and working with frequency tables is simple and straightforward.