Pandas Select Row by Index
In data analysis with Python, Pandas is a powerful library used for manipulating and analyzing structured data.
Selecting specific rows from a DataFrame based on their index is a common task in data manipulation.
In this tutorial, we will explore different ways to single or multiple rows from a Pandas DataFrame based on their index.
- Using loc[]
- Using iloc[]
- Select Multiple Rows
- Select Rows by Condition
- Select Rows in Range
- Conclusion
Table of Contents
1. Using loc[]
The .loc[] method in Pandas allows for label-based indexing, enabling the selection of rows based on their index labels.
To select a single row from a DataFrame pass the index label of the row to the .loc[] method.
The following example selects the 2nd row (index=1) from the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting a row using df.loc[]
print(df.loc[1])
Output:
Name Bob Age 30 City LA Name: 1, dtype: object
2. Using iloc[]
The .iloc[] method provides integer-based indexing, allowing the selection of rows based on their integer position in the DataFrame.
Pass the index position of the row to select a single row from a DataFrame.
Selecting the 3rd row (index=2) from the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting a row using df.iloc[]
print(df.iloc[2])
Output:
Name Bob Age 35 City SF Name: 2, dtype: object
3. Select Multiple Rows
Selecting multiple columns is required when you wish to include a desired set of rows in the DataFrame for further analysis.
To select multiple rows from a DataFrame, you need to pass a list of index labels or index positions to the .loc[] or .iloc[] method.
For example, to select 2nd, 3rd, and 4th rows from the DataFrame, you need to pass [1, 2, 3]
.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting multiple row using df.loc[]
print(df.loc[[0,2]])
print()
# Selecting multiple row using df.iloc[]
print(df.iloc[[0,2]])
Output:
Name Age City 0 Alice 25 NY 2 Charlie 35 SF Name Age City 0 Alice 25 NY 2 Charlie 35 SF
4. Select Rows by Condition
Data analysis often requires the selection of rows based on a condition for further analysis.
To select rows from the DataFrame where the age is greater than 30, you can pass the condition df['Age'] > 30
to the .loc[] or .iloc[] method and it will return a DataFrame with rows where the condition is True
.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting rows using condition
print(df.loc[df['Age'] > 30])
print()
# Selecting rows using condition
print(df.iloc[(df['Age'] > 30).values])
Output:
Name Age City 2 Charlie 35 SF 3 David 40 NY 4 Emma 45 LA Name Age City 2 Charlie 35 SF 3 David 40 NY 4 Emma 45 LA
5. Select Rows in Range
From a DataFrame, with 1000s of rows, what if you want to select rows from 100 to 200?
Passing a list from 100 to 200 will look messy and is not a good practice.
Instead, you can pass 100:200
to the .loc[] this will select rows from 100 to 200.
Note: You can also use 100:200
with .iloc[] but it will select rows from 100 to 199. The last index is excluded.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting rows in range using df.loc[]
print(df.loc[0:2])
# Selecting rows in range using df.iloc[]
print(df.iloc[0:2])
Output:
Name Age City 0 Alice 25 NY 1 Bob 30 LA 2 Charlie 35 SF Name Age City 0 Alice 25 NY 1 Bob 30 LA
Conclusion
Selecting rows by index in Pandas DataFrames can be accomplished in 2 ways: .loc[] and .iloc[].
These techniques offer flexibility in extracting specific rows based on their labels, integer positions, or conditional criteria.