Pandas Sort by Multiple Columns
Sorting a DataFrame by multiple columns becomes important when there is a need to establish a hierarchical order based on different attributes.
For example, let a dataset contain student information, where you want to sort first by their grades and then by their age to get a more nuanced understanding of performance within age groups.
- Sort by Multiple Columns
- Sort by Multiple Columns in Descending Order
- Sort by Multiple Columns with Different Order
- Customise Sorting Priority
- Conclusion
Table of Contents
Sort by Multiple Columns
The sort_values() method in Pandas allows us to sort a DataFrame by one or more columns.
Pass a list of column names to the sort_values() method to sort by multiple columns in the order of priority. For example, df.sort_values(['col1', 'col2'])
will sort the DataFrame by col1 first and then by col2.
Here is a working example of sorting a DataFrame by multiple columns.
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [45, 40, 35, 40, 38],
'Score': [95, 80, 92, 70, 60]}
df = pd.DataFrame(data)
# 👇 Sort DataFrame first by 'Age' and then by 'Score'
df_sorted_multiple = df.sort_values(by=['Age', 'Score'])
print(df_sorted_multiple)
Output:
Name Age Score 2 Charlie 35 92 4 Eve 38 60 3 David 40 70 1 Bob 40 80 0 Alice 45 95
Sort by Multiple Columns in Descending Order
By default the sort_values method sorts the DataFrame in ascending order. To sort in descending order, pass ascending=False as an argument.
Let's sort the DataFrame below in descending order.
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [45, 40, 35, 40, 38],
'Score': [95, 80, 92, 70, 60]}
df = pd.DataFrame(data)
# 👇 Sort DataFrame in descending order by 'Age' and then by 'Score'
df_sorted_multiple = df.sort_values(by=['Age', 'Score'], ascending=False)
print(df_sorted_multiple)
Output:
Name Age Score 0 Alice 45 95 1 Bob 40 80 3 David 40 70 4 Eve 38 60 2 Charlie 35 92
Sort by Multiple Columns with Different Order
It is also possible to sort by multiple columns with different order. For example, you can sort by Age in descending order and then by Score in ascending order.
For this, pass a list of True (ascending) and False (descending) representing the order of sorting for each column.
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [45, 40, 35, 40, 38],
'Score': [95, 80, 92, 70, 60]}
df = pd.DataFrame(data)
# 👇 Sort DataFrame first by 'Age' in descending order
# and then by 'Score' in ascending order
df_sorted_multiple = df.sort_values(by=['Age', 'Score'], ascending=[False, True])
print(df_sorted_multiple)
Output:
Name Age Score 0 Alice 45 95 1 Bob 40 80 3 David 40 70 4 Eve 38 60
Customise Sorting Priority
Custom sorting priority is useful when you want to sort by multiple columns but want to give priority to one column over the other.
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [45, 40, 35, 40, 38],
'Score': [95, 70, 92, 80, 60]}
df = pd.DataFrame(data)
# 👇 custom sorting order
custom_order = {'Alice': 3, 'Bob': 2, 'Charlie': 1, 'David': 2, 'Eve': 4}
df_sorted = df.sort_values(by=['Name', 'Age'], key=lambda x: x.map(custom_order))
print(df_sorted)
Output:
Name Age Score 2 Charlie 35 92 1 Bob 40 70 3 David 40 80 0 Alice 45 95 4 Eve 38 60
You can see that the DataFrame is sorted by given custom order of names and then by age.
Conclusion
Mastering multi-column sorting in Pandas opens up a realm of possibilities for organizing your data.
Whether you need simple ascending or descending sorts or more complex custom orders, Pandas provides the tools you need to efficiently arrange your DataFrame.