How to Sort a Pandas DataFrame

Spread the love

Pandas is a versatile and powerful Python data analysis library. Among its many features, sorting is one of the most used operations that data analysts use to organize their data. In this extensive guide, we will look into how to sort a DataFrame using the Pandas library in Python.

Creating a Pandas DataFrame

For the purposes of this tutorial, we will create a simple DataFrame. A DataFrame is a two-dimensional data structure, similar to a table in SQL, Excel, or Google Sheets. It is composed of rows and columns. Here is how to create a DataFrame:

import pandas as pd

data = {
    'Name': ['Tom', 'Nick', 'John', 'Peter'],
    'Age': [20, 21, 19, 18],
    'Score': [85.5, 90.7, 90.5, 92.5]
}

df = pd.DataFrame(data)

print(df)

In this code, data is a dictionary where each key-value pair corresponds to a column in the DataFrame. The DataFrame function is then used to convert this dictionary into a DataFrame.

Sort by Column in Ascending Order

To sort a DataFrame by a single column in ascending order (from smallest to largest), we use the sort_values function and provide the name of the column by which to sort. The syntax is as follows:

df.sort_values('column_name')

Here is an example that sorts the above DataFrame by the ‘Age’ column:

sorted_df = df.sort_values('Age')

print(sorted_df)

The output DataFrame, sorted_df, is sorted by ‘Age’ in ascending order. Notice that the original DataFrame df is not changed. The sort_values function does not modify the original DataFrame unless you tell it to do so.

Sort by Column in Descending Order

To sort a DataFrame by a single column in descending order (from largest to smallest), you need to pass an additional argument, ascending=False, to the sort_values function:

df.sort_values('column_name', ascending=False)

Here is an example that sorts the DataFrame by the ‘Age’ column in descending order:

sorted_df = df.sort_values('Age', ascending=False)

print(sorted_df)

Sort by Multiple Columns

Pandas also allows you to sort your DataFrame by multiple columns. This is particularly useful when you have duplicate values in the column by which you’re sorting and you want to use another column as a tiebreaker.

To do this, you pass a list of column names to the sort_values function:

df.sort_values(['column_name1', 'column_name2'])

By default, all columns will be sorted in ascending order. Here is an example that sorts the DataFrame first by ‘Score’ and then by ‘Age’:

sorted_df = df.sort_values(['Score', 'Age'])

print(sorted_df)

If you want to sort each column in a different order, you can pass a list of booleans to the ascending argument, where True means ascending order and False means descending order:

df.sort_values(['column_name1', 'column_name2'], ascending=[True, False])

For example, to sort by ‘Score’ in ascending order and then by ‘Age’ in descending order, you would do:

sorted_df = df.sort_values(['Score', 'Age'], ascending=[True, False])

print(sorted_df)

Sorting Inplace

All the examples so far have created a new sorted DataFrame and have left the original DataFrame unchanged. If you want to sort the DataFrame in place, meaning the original DataFrame is changed, you can do so by setting the inplace argument to True:

df.sort_values('column_name', inplace=True)

Here is an example that sorts the DataFrame by ‘Age’ in ascending order and modifies the original DataFrame:

df.sort_values('Age', inplace=True)

print(df)

Resetting the Index

When sorting a DataFrame, the original index is kept. This can lead to a situation where the index is no longer in a logical order. If you want to reset the index after sorting, you can use the reset_index function.

By default, reset_index will add a new column to the DataFrame containing the old index. If you do not want this, you can pass the argument drop=True to drop the old index:

df.sort_values('column_name').reset_index(drop=True)

For example, to sort by ‘Age’ and then reset the index, you would do:

sorted_df = df.sort_values('Age').reset_index(drop=True)

print(sorted_df)

Sorting by Index

Sometimes you may want to sort by the index rather than a column. You can do this using the sort_index function:

df.sort_index()

By default, sort_index will sort in ascending order. If you want to sort in descending order, you can pass ascending=False:

df.sort_index(ascending=False)

To sort the index in place, you can use inplace=True:

df.sort_index(ascending=False, inplace=True)

Conclusion

In this guide, we have covered how to sort a Pandas DataFrame by a column in ascending or descending order, how to sort by multiple columns, how to sort in place, and how to sort by the index. With these tools, you should be able to effectively sort your data in any way you need to for your data analysis tasks.

Leave a Reply