Drop Rows with NaN Pandas
NaN are missing values in a DataFrame, for some analysis it may cause a problem or inconsistency in the result. So, it is better to drop the rows with NaN values.
In this article, you will learn how to drop rows with NaN values and how to drop rows with NaN values in a specific column.
- Dropping Rows with NaN ๐
- Dropping Rows Based on Specific Columns ๐ฏ
- Dropping Rows with a Threshold ๐
- Conclusion
Table of Contents
1. Dropping Rows with NaN
To drop rows with NaN values use dropna()
function. When it is applied on a Dataframe, it drops all the rows with NaN values.
For example, df.dropna()
will drop any row with NaN values.
import pandas as pd
# Creating a sample DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, 30],
'Score': [90, 85, None, 92]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ Dropping rows with NaN
df.dropna(inplace=True)
print("\nDataFrame after dropping rows with NaN:")
print(df)
Output:
Original DataFrame: Name Age Score 0 Alice 25.0 90.0 1 Bob NaN 85.0 2 Charlie 35.0 NaN 3 David 30.0 92.0 DataFrame after dropping rows with NaN: Name Age Score 0 Alice 25.0 90.0 3 David 30.0 92.0
You can clearly see that all rows with NaN values are dropped.
2. Dropping Rows Based on Specific Columns
Now, suppose you only want to drop those rows which have NaN values in a specific column. For example, we have a dataframe with a column "id" which must have a value, so it is absolutely necessary to drop the rows with NaN values in the "id" column.
To drop rows with NaN values for a specific column pass the column name in the subset
parameter of dropna()
function.
import pandas as pd
# Creating a sample DataFrame with NaN values
data = {'id': [1, 2, None, 4],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, None],
'Score': [None, 85, 80, 75]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ Dropping rows with NaN in "id" column
df.dropna(subset=['id'], inplace=True)
print("\nDataFrame after dropping rows with NaN in 'id' column:")
print(df)
Output:
Original DataFrame: id Name Age Score 0 1.0 Alice 25.0 NaN 1 2.0 Bob 30.0 85.0 2 NaN Charlie 35.0 80.0 3 4.0 David NaN 75.0 DataFrame after dropping rows with NaN in 'id' column: id Name Age Score 0 1.0 Alice 25.0 NaN 1 2.0 Bob 30.0 85.0 3 4.0 David NaN 75.0
As you can see, only the rows with NaN values in the "id" column are dropped.
3. Dropping Rows with a Threshold
Suppose you want to drop rows with NaN values only if there are more than 2 NaN values (or n NaN) in a row. To do this, pass the thresh
parameter with the value 2 (or n) in the dropna()
function.
import pandas as pd
# Creating a sample DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', None, 'David'],
'Age': [25, None, 35, 30],
'Score': [90, 85, None, 92]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ Dropping rows with NaN values only if
# there are more than 2 NaN values in a row
df.dropna(thresh=2, inplace=True)
print("\nDataFrame after dropping rows 2 NaN values:")
print(df)
Output:
Original DataFrame: Name Age Score 0 Alice 25.0 90.0 1 Bob NaN 85.0 2 None 35.0 NaN 3 David 30.0 92.0 DataFrame after dropping rows 2 NaN values: Name Age Score 0 Alice 25.0 90.0 1 Bob NaN 85.0 3 David 30.0 92.0
In the above output you can see third row is not dropped because it has only one NaN value.
Conclusion
Now whether it's a need to drop rows with NaN values or to drop rows with NaN values in a specific column, you know how to do it.
This skill will help you to clean your data before performing any analysis.