How to Reset Index in Pandas
In data analysis, we often need to filter out some rows or columns from the dataset to make it more meaningful. But after filtering out the rows or columns, the index of the dataframe is not in order.
After filtering or manipulating the DataFrame, it still might contains the old index values, which can cause discontinuity in the index values. To avoid this, we can reset the index of the dataframe.
In this tutorial, we will explore various methods to reset the index of a dataframe in Pandas.
- Understanding Index in Pandas
- Reset Index in Pandas
- Conclusion
Table of Contents
Understanding Index in Pandas
Index in Pandas is a way to uniquely identify each row of the dataframe. By default, the index of a dataframe is set to start at 0 and increment by 1 for each row.
Let's create a dataframe and see how the index works.
import pandas as pd
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
'Age': [34, 29, 30, 25, 32, 27],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City 0 John 34 New York 1 Smith 36 Los Angeles 2 Dave 30 Chicago 3 James 29 Houston 4 Robert 32 Philadelphia 5 Maria 27 Phoenix
As you can see, the index of the dataframe starts at 0 and increments by 1 for each row.
Now, let's filter out some rows from the dataframe and see how the index changes.
# filter out rows with age less than 30
df = df[df['Age'] >= 30]
print(df)
Output:
Name Age City 0 John 34 New York 2 Smith 36 Los Angeles 4 Robert 32 Philadelphia
As you can see, the index of the dataframe is not in order anymore. It still contains the old index values, which can cause discontinuity in the index values.
To avoid this, we can reset the index of the dataframe.
Reset Index in Pandas
To reset the index of a dataframe, we can use the reset_index()
method.
Syntax:
df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
Let's see how it works.
Reset Index to Start at 0
To reset the index of a dataframe to start at 0, we can use the reset_index()
method with drop
parameter set to True
.
import pandas as pd
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
'Age': [34, 29, 30, 25, 32, 27],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}
df = pd.DataFrame(data)
# filter out rows with age less than 30
df = df[df['Age'] >= 30]
# reset index to start at 0
df = df.reset_index(drop=True)
print(df)
Output:
Name Age City 0 John 34 New York 1 Smith 36 Los Angeles 2 Robert 32 Philadelphia
As you can see, the index of the dataframe is reset to start at 0.
Reset Index to Start at 1
To reset the index of a dataframe to start at 1, we can use the reset_index()
method with drop
parameter set to True
and start
parameter set to 1
.
import pandas as pd
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
'Age': [34, 29, 30, 25, 32, 27],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}
df = pd.DataFrame(data)
# filter out rows with age less than 30
df = df[df['Age'] >= 30]
# reset index to start at 1
df = df.reset_index(drop=True, start=1)
print(df)
Output:
Name Age City 1 John 34 New York 2 Smith 36 Los Angeles 3 Robert 32 Philadelphia
Now the index of the dataframe starts at 1.
Reset Index to Own Custom Index
A dataframe can also have a custom index. For our example, we will alphabets as index.
To set alphabets as index, we will need to use set_index()
method. Let's see how it works.
# Import pandas package
import pandas as pd
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
'Age': [34, 29, 30, 25, 32, 27],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}
# custom index
alpha = ['a', 'b', 'c', 'd', 'e', 'f']
# create dataframe with custom index
df = pd.DataFrame(data, index=alpha)
# In this case default index is exist
df.reset_index(inplace = True)
print(df)
Output:
index Name Age City 0 a John 34 New York 1 b Smith 29 Los Angeles 2 c Dave 30 Chicago 3 d James 25 Houston 4 e Robert 32 Philadelphia 5 f Maria 27 Phoenix
A new column index
is created as a custom index.
Reset Index and Remove Old Index
In this example code snippet, we are setting alphabets as index and no reset_index()
method is used.
# Import pandas package
import pandas as pd
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
'Age': [34, 29, 30, 25, 32, 27],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}
# custom index
alpha = ['a', 'b', 'c', 'd', 'e', 'f']
# create dataframe with custom index
df = pd.DataFrame(data, index=alpha)
print(df)
Output:
Name Age City a John 34 New York b Smith 29 Los Angeles c Dave 30 Chicago d James 25 Houston e Robert 32 Philadelphia f Maria 27 Phoenix
Conclusion
Resetting the index in pandas is a fundamental operation that allows us to reorganize and transform DataFrames based on specific requirements. In this article, we explored various methods such as reset_index(), set_index(), and the use of the inplace parameter.
By understanding these techniques and using the provided code examples, you can confidently reset the index in pandas and effectively manage your data analysis tasks.
Happy Learning!