Pandas Select Columns by Name
Pandas provide a wide range of functionalities to clean and manipulate data. One of the most common tasks in data analysis is to select a subset of columns from a DataFrame. This flexibility allows us efficient data exploration and analysis.
In this tutorial, you will learn how to select columns from a DataFrame by their name.
- Select Single Column
- Select Multiple Columns
- Select Columns by Booleans
- Select Columns in Range
- Select Columns by Regex
- Select Columns by Condition
- Conclusion
Table of Contents
1. Select Single Column
To select a single column from a DataFrame, you can use the df['column_name']
syntax. This will return a Series object.
The following example selects the 'Name' column from the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
print(df['Name'])
Output:
0 Alice 1 Bob 2 Charlie 3 David 4 Emma Name: Name, dtype: object
Another way to select a single column by name is to use the df.loc[]
method. This method is used to select rows and columns by labels.
To select a single column of the label 'Name', you can use df.loc[:, 'Name']
syntax.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting a single column by name
# using df.loc[]
print(df.loc[:, 'Name'])
Output:
0 Alice 1 Bob 2 Charlie 3 David 4 Emma Name: Name, dtype: object
2. Select Multiple Columns
To select multiple columns from a DataFrame, you can pass a list of column names to the df[]
operator or df.loc[]
method.
Let's select the 'Name'
and 'City'
columns from the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting multiple columns by name
# using df[]
print(df[['Name', 'City']])
# using df.loc[]
print(df.loc[:, ['Name', 'City']])
Output:
Name City 0 Alice NY 1 Bob LA 2 Charlie SF 3 David NY 4 Emma LA Name City 0 Alice NY 1 Bob LA 2 Charlie SF 3 David NY 4 Emma LA
3. Select Columns by Booleans
Instead of passing a list of column names, you can also pass a list of booleans representing the columns you want to select.
For a series of booleans, the True
values will be selected and False
values will be ignored.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting columns by booleans
# using df.loc[]
print(df.loc[:, [True, False, True]])
Output:
Name City 0 Alice NY 1 Bob LA 2 Charlie SF 3 David NY 4 Emma LA
Here, [True, False, True]
tells the df.loc[]
method to select the first and third columns.
4. Select Columns in Range
Suppose there are 26 columns in a DataFrame named from 'A' to 'Z'. To select all the columns from 'A' to 'F', you can use the df.loc[:, 'A':'F']
syntax.
This will select all the columns from 'A' to 'F' including both columns.
Let's see an example.
import pandas as pd
# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [11, 22, 33, 44, 55],
'D': [12, 24, 36, 48, 60],
'E': [13, 26, 39, 52, 65],
'F': [14, 28, 42, 56, 70]}
df = pd.DataFrame(data)
# Selecting columns in range
# select all the columns from 'B' to 'E'
print(df.loc[:, 'B':'E'])
Output:
B C D E 0 10 11 12 13 1 20 22 24 26 2 30 33 36 39 3 40 44 48 52 4 50 55 60 65
5. Select Columns by Regex
To select columns by regex, you can use the df.filter()
method. This method takes a regex as an argument and returns the columns matching the regex.
The regex value is applied to the column names and the columns matching the regex are returned.
Let's see an example.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
'Age': [25, 30, 35, 40, 45],
'City': ['NY', 'LA', 'SF', 'NY', 'LA']}
df = pd.DataFrame(data)
# Selecting columns by regex
# select all the columns having 'A' or 'a' in their names
print(df.filter(regex='[Aa]'))
Output:
Name Age 0 Alice 25 1 Bob 30 2 Charlie 35 3 David 40 4 Emma 45
6. Select Columns by Condition
Suppose you want to select all the columns having a mean greater than 50. To do so, you can use the df.mean()
method to calculate the mean of all the columns and then pass the condition to the df.loc[]
method.
Let's see an example.
import pandas as pd
# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [11, 22, 33, 44, 55],
'D': [12, 24, 36, 48, 60],
'E': [13, 26, 39, 52, 65],
'F': [14, 28, 42, 56, 70]}
df = pd.DataFrame(data)
# Selecting columns by condition
# select all the columns having a mean greater than 35
print(df.loc[:, df.mean() > 35])
Output:
D E F 0 12 13 14 1 24 26 28 2 36 39 42 3 48 52 56 4 60 65 70
Conclusion
Now you can select columns from a DataFrame by their name in various possible ways.
Learn how to select rows by condition from a DataFrame.