Pandas Select Rows by Condition


Pandas is most commonly used Python library for data analysis. It gives us options to select rows based on conditions, which is a very common operation in data analysis.

In this article, we will look at various ways to select rows based on conditions in a Pandas DataFrame.

Our Test Data

Let's create a DataFrame to work with.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}

df = pd.DataFrame(data)
print(df)

Output:

      Name  Age City
0    Alice   25   NY
1      Bob   30   LA
2  Charlie   35   SF
3    David   40   NY
4     Emma   45   LA

Select Rows based on Condition

Let's say we want to select all the rows where the Age is greater than 30.

We can do this by passing the condition inside the square brackets [] as shown below.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)

# Selecting rows based on condition
df = df[df['Age'] > 30]

print(df)

Output:

      Name  Age City
2  Charlie   35   SF
3    David   40   NY
4     Emma   45   LA

Here we are passing the condition df['Age'] > 30 inside the square brackets [] which will return a boolean series with True for all the rows where the Age is greater than 30 and False for all the rows where the Age is less than or equal to 30.

Now, when we pass this boolean series inside the square brackets [] of the DataFrame, it will return all the rows where the boolean series has True value.

So, the above code will return all the rows where the Age is greater than 30.


Select Rows based on Multiple Conditions

You can apply multiple conditions using logical operators like & (and), | (or), ~ (not) etc.

Let's say we want to select all the rows where the Age is greater than 30 and the City is NY.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)

# Selecting rows based on multiple conditions
df = df[(df['Age'] > 30) & (df['City'] == 'NY')]

print(df)

Output:

    Name  Age City
3  David   40   NY

Negate a Condition

Let's say we want to select all the rows where the Age is not greater than 30.

For this, we can use the ~ (not) operator.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)

# Selecting rows based on negation of condition
df = df[~(df['Age'] > 30)]

print(df)

Output:

    Name  Age City
0  Alice   25   NY
1    Bob   30   LA

Select Rows based on Condition in List

Let's say we want to select all the rows where the City is either NY or SF.

We can do this by passing the condition inside the square brackets [] as shown below.

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)

# Selecting rows based on condition in list
df = df[df['City'].isin(['NY', 'SF'])]

print(df)

Output:

    Name  Age City
0  Alice   25   NY
2  Charlie   35   SF
3  David   40   NY

Other Methods

Here are few more methods for you to explore for selecting rows based on conditions.

1. Using query() method

The query() is an inbuilt method in Pandas which executes an expression string on the DataFrame.

You can pass single or multiple conditions inside it in the form of a string.

For example:

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)

# using query() method
df = df.query('Age > 30 & City in ["NY", "SF"]')

print(df)

Output:

    Name  Age City
2  Charlie   35   SF
3  David   40   NY

2. Using loc[] method

The loc[] method is used to access a group of rows and columns by label(s) or a boolean array.

For example:

import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)

# using loc[] method
df = df.loc[(df['Age'] > 30) & (df['City'].isin(['NY', 'SF']))]

print(df)

Output:

    Name  Age City
2  Charlie   35   SF
3  David   40   NY

Conclusion

We have covered various examples and methods to select rows based on conditions in a Pandas DataFrame.

For more information, refer to the official documentation.