Merge Two DataFrames Pandas 🤝

In the realm of data manipulation, merging two DataFrames is a common task that unlocks new insights and enhances analysis.

In this article, we will look at different ways to merge two DataFrames in Pandas.

Table of Contents

How to Merge Pandas DataFrames
Applying Optional Parameters
Merge based on different column names
Change the merge type
Conclusion

How to Merge Pandas DataFrames

Merging Pandas DataFrames is like blending ingredients to create a delicious recipe. Each method and parameter adds its unique flavor to the mix, resulting in a harmonious combination of data.

Let's start with the basics and merge two DataFrames using the merge() method.

1. Using merge()

The merge() method is the most powerful and commonly used method to merge two DataFrames in Pandas. It is a versatile method that can merge DataFrames based on a single or multiple columns.

Let's see how to merge two DataFrames based on a single column.

Example 1: Merge two DataFrames based on a single column

Suppose we have two DataFrames, df1 and df2, and we want to merge them based on the id column. Here is how we can do this:

import pandas as pd

# create first dataframe
df1 = pd.DataFrame({'id': [1, 2, 3],
                    'name': ['John', 'Mike', 'Sarah']})

# create second dataframe
df2 = pd.DataFrame({'id': [1, 2, 3],
                    'age': [22, 25, 21]})

# 👉 merge the two dataframes
df3 = pd.merge(df1, df2, on='id')

print(df3)

Output:

   id   name  age
0   1   John   22
1   2   Mike   25
2   3  Sarah   21

Example 2: Merge two DataFrames based on multiple columns

The following example merges two DataFrames based on the id and name columns.

import pandas as pd

# create first dataframe
df1 = pd.DataFrame({'id': [1, 2, 3],
                    'name': ['John', 'Bob', 'Sarah']})

# create second dataframe
df2 = pd.DataFrame({'id': [1, 2, 3],
                    'name': ['John', 'Mike', 'Sarah'],
                    'age': [22, 25, 21]})

# 👉 merge the two dataframes
df3 = pd.merge(df1, df2, on=['id', 'name'])

print(df3)

Output:

   id   name  age
0   1   John   22
1   3  Sarah   21

2. Using join()

The join() method elegantly merges DataFrames based on their indices, providing a seamless integration of data.

LThe following example merges two DataFrames based on their indices.

import pandas as pd

# create first dataframe
df1 = pd.DataFrame({'name': ['John', 'Bob', 'Sarah'],
                    'age': [22, 25, 21]})

# create second dataframe
df2 = pd.DataFrame({'id': [1, 2, 3],
                    'city': ['London', 'Berlin', 'New York'],
                    'age': [22, 25, 21]})

# 👉 merge the two dataframes using join()
df3 = df1.join(df2, lsuffix='_caller', rsuffix='_other')

print(df3)

Output:

    name  age_caller  id      city  age_other
0   John          22   1    London         22
1    Bob          25   2    Berlin         25
2  Sarah          21   3  New York         21

3. Using concat()

The concat() function offers flexibility in combining datasets along specified axes, allowing for versatile data integration.

The following example merges two DataFrames along the columns axis (by default).

import pandas as pd

# create first dataframe
df1 = pd.DataFrame({'name': ['John', 'Bob', 'Sarah'],
                    'age': [22, 25, 21]})

# create second dataframe
df2 = pd.DataFrame({'name': ['Mike', 'Tom', 'Harry'],
                    'age': [27, 21, 30]})

# 👉 merge the two dataframes using concat()
df3 = pd.concat([df1, df2])

print(df3)

Output:

    name  age
0   John   22
1    Bob   25
2  Sarah   21
0   Mike   27
1    Tom   21
2  Harry   30

Applying Optional Parameters

To avoide duplicate columns in the merged DataFrame, we can use the suffixes parameter to specify the suffixes to be used for overlapping columns.

To avoide confusion you can pass a list of strings as suffixes, where the first element is appended to the overlapping columns from the left DataFrame, and the second element is appended to the overlapping columns from the right DataFrame.

import pandas as pd

# create first dataframe
df1 = pd.DataFrame({'id': [1, 2, 3],
                    'name': ['John', 'Bob', 'Sarah']})

# create second dataframe
df2 = pd.DataFrame({'id': [1, 2, 3],
                    'name': ['Adam', 'Smith', 'Dave'],
                    'age': [22, 25, 21]})

# 👉 merge the two dataframes with suffixes
df3 = pd.merge(df1, df2, on='id', suffixes=['_caller', '_other'])

print(df3)

Output:

   id name_caller name_other  age
0   1        John       Adam   22
1   2         Bob      Smith   25
2   3       Sarah       Dave   21

Merge based on different column names

Merging DataFrames based on different column names is a common task in data analysis. For example, you may have a column named id in one DataFrame and user_id in another DataFrame and you want to merge them based on these columns.

import pandas as pd

# create first dataframe
df1 = pd.DataFrame({'id': [1, 2, 3],
                    'name': ['John', 'Bob', 'Sarah']})

# create second dataframe
df2 = pd.DataFrame({'user_id': [1, 2, 3],
                    'age': [22, 25, 21]})

# 👉 merge the two dataframes based on 2 different columns
df3 = pd.merge(df1, df2, left_on='id', right_on='user_id')

print(df3)

Output:

   id   name  user_id  age
0   1   John        1   22
1   2    Bob        2   25
2   3  Sarah        3   21

Change the merge type

By default, the merge() method performs an inner join on the DataFrames. However, we can change the merge type by using the how parameter.

Allowed value for how parameter are inner, outer, left, and right joins tailored to your needs.

# Change merge type to outer
merged_outer_df = pd.merge(df1, df2, on='ID', how='outer')
print(merged_outer_df)

Conclusion

Merging Pandas DataFrames is an essential skill for any data wrangler. By mastering the various methods, optional parameters, and best practices discussed here, you're well-equipped to blend disparate datasets into a cohesive whole, unlocking valuable insights and unleashing the full potential of your data analysis endeavors.

Happy merging! 🎉✨