Pandas loc vs iloc
Pandas, a popular Python library for data manipulation, provides two essential methods, loc and iloc, for selecting specific rows and columns in DataFrames.
While they serve similar purposes, they are different in many ways. In this article, we will discuss the differences between loc and iloc and when to use them.
- Pandas loc
- Pandas iloc
- Pandas loc vs iloc
- Conclusion
Table of Contents
Pandas loc
The loc[] method is primarily used to select rows and columns from the DataFrame based on the labels of the rows and columns.
loc['A']
selects the row with label 'A'.loc[:, 'Name']
selects the column with name 'Name'.loc['A', 'Name']
selects the element at the intersection of row 'A' and column 'Name'.loc['A':'C', 'Name':'Age']
selects the rows with labels 'A' to 'C' and the columns with names 'Name' to 'Age'.
# Selecting a single row by name
df.loc['A']
# Selecting a single column by name
df.loc[:, 'Name']
# Selecting a single element by name and index
df.loc['A', 'Name']
# Selecting multiple rows and columns by name
df.loc[['A', 'B'], ['Name', 'Age']]
# Selecting rows and columns with a Boolean Series
df.loc[df['Age'] > 30, 'Name']
Pandas iloc
The iloc[] method is integer-based and uses integer positions to access data in a DataFrame. It enables you to select rows and columns based on their position rather than their labels.
iloc[0]
selects the first row of the DataFrame.iloc[:, 0]
selects the first column of the DataFrame.iloc[1, 2]
selects the element at the intersection of row 1 and column 2.iloc[1:3, 2]
selects rows 1 to 3 and only the second column.iloc[[0, 2], [0, 2]]
selects rows 0 and 2 and only the first and second columns.
# Select row 0 and column 0
df.iloc[0, 0]
# Select column 'Name'
df.iloc[:, 0]
# Select row 1 and all columns
df.iloc[1, :]
# Select rows 0 to 2
df.iloc[0:3]
# Select columns 1 and 2
df.iloc[:, 1:3]
# Select a diagonal slice
df.iloc[:, 0:2]
# Select rows where 'Age' is greater than 30
df.iloc[df['Age'] > 30, :]
# Select a specific element
df.iloc[1, 2] # Selects the value in row 1, column 2
# Select rows 0 and 2, and columns 0 and 2
df.iloc[[0, 2], [0, 2]]
Pandas loc vs iloc
The following table summarizes the differences between loc and iloc.
Feature | loc | iloc |
---|---|---|
Indexing Method | Label-based | Integer-based |
Index Type | Labels, such as column names or row numbers | Integer positions (zero-based) |
Error Handling | Raises KeyError if label doesn't exist | Raises IndexError if index is out of bounds |
Slicing Behavior | Includes the endpoints of slices | Excludes the endpoints of slices |
Conditional Selection | Accepts Boolean Series | Accepts Boolean Series or list of integers |
Filtering | More intuitive for filtering based on column names | More efficient for filtering based on integer indices |
Typical Use Cases | Selecting rows and columns by name, filtering based on conditions with column names | Selecting rows and columns by index position, slicing data using integer ranges |
Selecting a column | df.loc[:, 'column_name'] | df.iloc[:, column_index] |
Selecting multiple columns | df.loc[:, ['col1', 'col2']] | df.iloc[:, [index1, index2]] |
Selecting a row | df.loc['index_label', :] | df.iloc[row_index, :] |
Selecting multiple rows | df.loc[['label1', 'label2'], :] | df.iloc[[index1, index2], :] |
Conditional selection | df.loc[df['column_name'] > threshold, :] | df.iloc[(df['column_index'] > threshold).values, :] |
Accessing specific element | df.loc['label', 'column_name'] | df.iloc[row_index, column_index] |
Performance | Slightly slower due to label-based indexing | Faster due to integer-based indexing |
Conclusion
The loc and iloc methods in Pandas offer distinct approaches to selecting rows and columns in DataFrames. loc employs label-based indexing, while iloc uses integer positions for selection.
Understanding the differences between these methods is crucial for efficiently accessing and manipulating data within Pandas DataFrames.