Pandas DataFrame rank() method with examples

Spread the love

The rank() method in pandas compute numerical data ranks (1 through n) along axis. By default, equal values are assigned a rank that is the average of the ranks of those values.

Syntax –

DataFrame.rank(axis=0, method='average', numeric_only=None, na_option='keep', ascending=True, pct=False)

axis – Whether to compute the ordering row-wise or column-wise

method – How to rank duplicate values in a group

  • average: average rank of the group
  • min: lowest rank in the group
  • max: highest rank in the group
  • first: ranks assigned in order they appear in the array
  • dense: like ‘min’, but rank always increases by 1 between groups.

numeric_only – For DataFrame objects, rank only numeric columns if set to True.

na_option – How to rank NaN values:

  • keep: assign NaN rank to NaN values
  • top: assign lowest rank to NaN values
  • bottom: assign highest rank to NaN values

ascending – Whether or not the elements should be ranked in ascending order. By default it is True.

pct – Whether or not to display the returned rankings in percentile form.

Examples –

Let’s create a dataframe to work with.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Animal': ['cat', 'penguin', 'dog','spider', 'snake'],
                  'Number_legs': [4, 2, 4, 8, np.nan]})

1 . Rank data in ascending order –

By default pandas rank the data in ascending order i.e. the lowest value gets the highest rank.

df['Rank_asc'] = df['Number_legs'].rank()

Since 2 (penguin) is the smallest value in this column, it gets a rank of 1. Next cat and dogs both have 4 legs they gets a rank of 2.5 because by default pandas calculate the average of the rank when we have a duplicate value ( 2+3 / 2 = 2.5). Next spider has 8 legs that is why it gets a rank of 4.

2 . Rank data in descending order –

To rank the data in descending order we need to set the ascending parameter to ascending=False. That means the highest value gets the highest rank.

df['Rank_desc'] = df['Number_legs'].rank(ascending=False)

Since 8 (spider) is the highest value that is why it gets the rank of 1. The cats and dogs gets a rank of 2.5 and penguin gets a rank of 4.

3 . Ranking duplicate data –

As i said before by default pandas takes the average of the rank when we have duplicate data. But there are other methods also available.

df['average_rank'] = df['Number_legs'].rank()
df['max_rank'] = df['Number_legs'].rank(method='max')
df['min_rank'] = df['Number_legs'].rank(method='min')
df['first_rank'] = df['Number_legs'].rank(method='first')
df['dense_rank'] = df['Number_legs'].rank(method='dense')

setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)

4 . Ranking NaN values –

df['na_keep'] = df['Number_legs'].rank(na_option='keep')
df['na_top'] = df['Number_legs'].rank(na_option='top')
df['na_bottom'] = df['Number_legs'].rank(na_option='bottom')

When na_option is keep the null values gets a rank of NaN which is the default. But if you change the setting to top then NaN value gets the highest rank and when bottom then gets the lowest rank.

5. Rank data in percentile form –

To rank data in percentile form set the pct parameter to pct=True.

df['default_rank'] = df['Number_legs'].rank()
df['pct_rank'] = df['Number_legs'].rank(pct=True)

Rating: 1 out of 5.

Leave a Reply