Pandas DataFrame Sort by Column
Sorting a Pandas DataFrame by column enabels us to explore patterns, identify trends, and gain insights into your dataset.
In this article we will learn how to sort values and arrange DataFrame based on specific columns in ascending or descending order.
- Sort using sort_values() Method
- Sort by Multiple Columns
- Sort in Descending Order
- Sort with Missing Values
- Conclusion
Table of Contents
1. Sort using sort_values() Method
Pandas DataFrame sort_values() method is used to sort the DataFrame by the values of a column.
It takes a column name as its argument and returns a new DataFrame sorted by values in the given column.
Syntax: sort_values() method
df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
Here,
- by is the name of the column to sort the DataFrame by.
- axis is 0 for sorting rows and 1 for sorting columns.
- ascending is a boolean value which sorts the DataFrame in ascending order if True.
- inplace is a boolean value which makes the changes in the original DataFrame if set to True.
- kind is the type of sorting algorithm to use. It can be quicksort, mergesort, heapsort, stable.
- na_position is the position of missing values. It can be first or last.
Here is an example of sorting a DataFrame by a column.
Example 1:
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [45, 40, 35, 42],
'Score': [95, 80, 92, 70]}
df = pd.DataFrame(data)
print("Before sorting:")
print(df)
# π Sort DataFrame by 'Age' in ascending order
df_sorted_age = df.sort_values(by='Age')
print("\nAfter sorting by 'Age':")
print(df_sorted_age)
Output:
Before sorting: Name Age Score 0 Alice 45 95 1 Bob 40 80 2 Charlie 35 92 3 David 42 70 After sorting by 'Age': Name Age Score 2 Charlie 35 92 1 Bob 40 80 3 David 42 70 0 Alice 45 95
2. Sort by Multiple Columns
Sorting dataframe by multiple columns is required when we want to sort the DataFrame by more than one column.
For example, if we want to sort the DataFrame by 'Age' and 2 or more people have the same age, then we can sort them by 'Score', so that the person with the highest score comes first.
To sort by multiple columns, pass a list of column names to the sort_values() method, in the order of priority.
Example 2:
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [45, 40, 35, 40], # π 'David' and 'Bob' have same age
'Score': [95, 80, 92, 70]}
df = pd.DataFrame(data)
print("Before sorting:")
# π Sort DataFrame by 'Age' and 'Score' in ascending order
df_sorted_age_score = df.sort_values(by=['Age', 'Score'])
print("\nAfter sorting by 'Age' and 'Score':")
print(df_sorted_age_score)
Output:
Before sorting: Name Age Score 0 Alice 45 95 1 Bob 40 80 2 Charlie 35 92 3 David 40 70 After sorting by 'Age' and 'Score': Name Age Score 2 Charlie 35 92 3 David 40 70 1 Bob 40 80 0 Alice 45 95
As you can see, the DataFrame is first sorted by 'Age' and then by 'Score'.
3. Sort in Descending Order
By default, the sort_values() method sorts the DataFrame in ascending order.
To sort the DataFrame in descending order, set the ascending argument to False in the sort_values() method.
Example 3:
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [45, 40, 35, 42],
'Score': [95, 80, 92, 70]}
df = pd.DataFrame(data)
print("Before sorting:")
print(df)
# π Sort DataFrame by 'Age' in descending order
df_sorted_age = df.sort_values(by='Age', ascending=False)
print("\nAfter sorting by 'Age':")
print(df_sorted_age)
Output:
Before sorting: Name Age Score 0 Alice 45 95 1 Bob 40 80 2 Charlie 35 92 3 David 42 70 After sorting by 'Age': Name Age Score 0 Alice 45 95 3 David 42 70 1 Bob 40 80 2 Charlie 35 92
4. Sort with Missing Values
There can be NaN values in the DataFrame. The sort_values() method automatically places the missing values at the end of the DataFrame.
To change the position of missing values, set the na_position argument to first or last based on your requirement.
Example 4:
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [45, 40, 35, None], # π 'David' has missing value
'Score': [95, 80, 92, 70]}
df = pd.DataFrame(data)
print("Before sorting:")
print(df)
# π Sort DataFrame by 'Age' in ascending order
df_sorted_age = df.sort_values(by='Age', na_position='first')
print("\nAfter sorting by 'Age', missing values first:")
print(df_sorted_age)
Output:
Before sorting: Name Age Score 0 Alice 45.0 95 1 Bob 40.0 80 2 Charlie 35.0 92 3 David NaN 70 After sorting by 'Age', missing values first: Name Age Score 3 David NaN 70 2 Charlie 35.0 92 1 Bob 40.0 80 0 Alice 45.0 95
Conclusion
Knowledge of sorting DataFrame by column will help you to explore your dataset with different perspectives.
Now you can sort DataFrames by column in ascending or descending order, sort by multiple columns, and sort with missing values.
Happy Learning!π