Pandas Create DataFrame from List of Dicts
A list of dictionaries is a structured way to represent tabular data, where each dictionary corresponds to a row in the eventual DataFrame. Each key-value pair in a dictionary represents a column and its value.
Pandas provides a seamless ways to create DataFrame from different data structures. In this tutorial, we will learn how to create a DataFrame from a list of dictionaries.
- Creating DataFrame from List of Dicts π
- Customizing Column Order π·οΈ
- Handling Missing Values π΅οΈββοΈ
- Conclusion π
Table of Contents
1. Creating DataFrame from List of Dicts π
Creating a DataFrame from a list of dictionaries is as simple as passing the list to the DataFrame()
constructor.
Let's create a list of dictionaries representing the sales data of a company.
import pandas as pd
sales_data = [
{'name': 'John', 'product': 'apple', 'units': 10, 'price': 0.5},
{'name': 'Mary', 'product': 'banana', 'units': 5, 'price': 0.2},
{'name': 'Peter', 'product': 'apple', 'units': 2, 'price': 0.5},
{'name': 'John', 'product': 'banana', 'units': 3, 'price': 0.2}
]
# π Create DataFrame from list of dicts
df = pd.DataFrame(sales_data)
print(df)
Output:
name product units price 0 John apple 10 0.5 1 Mary banana 5 0.2 2 Peter apple 2 0.5 3 John banana 3 0.2
As you can see, each dictionary in the list represents a row in the DataFrame. The keys of the dictionary are the column names and the values are the values of the corresponding columns.
2. Customizing Column Order π·οΈ
Arrangement of column in DataFrame is very important. We often analyze data by looking at a few columns and ignore the rest. So, it is important to arrange the columns in a way that makes it easy to analyze the data.
Let's set the order of columns in the DataFrame.
sales_data = [
{'name': 'John', 'product': 'apple', 'units': 10, 'price': 0.5},
{'name': 'Mary', 'product': 'banana', 'units': 5, 'price': 0.2},
{'name': 'Peter', 'product': 'apple', 'units': 2, 'price': 0.5},
{'name': 'John', 'product': 'banana', 'units': 3, 'price': 0.2}
]
# π Create DataFrame from list of dicts
# and customize column order
df = pd.DataFrame(sales_data, columns=['product', 'price', 'units', 'name'])
print(df)
Output:
product price units name 0 apple 0.5 10 John 1 banana 0.2 5 Mary 2 apple 0.5 2 Peter 3 banana 0.2 3 John
As you can see, the columns are arranged in the order we specified.
3. Handling Missing Values π΅οΈββοΈ
In real-world scenarios, dictionaries might have missing values. Pandas automatically handles this by assigning NaN (Not a Number) for missing values.
import pandas as pd
# Introduce missing values in one dictionary
data_list_of_dicts_missing = [
{'Name': 'Alice', 'Age': 25, 'City': 'NY'},
{'Name': 'Bob', 'Age': 30, 'City': 'LA'},
{'Name': 'Charlie', 'City': 'SF'}
]
# π Create DataFrame with missing values
df_missing_values = pd.DataFrame(data_list_of_dicts_missing)
print("DataFrame with Missing Values:")
print(df_missing_values)
Output:
Name Age City 0 Alice 25.0 NY 1 Bob 30.0 LA 2 Charlie NaN SF
As you can see, the DataFrame has NaN for the missing value of the Age
column.
Conclusion π
Creating a Pandas DataFrame from a list of dictionaries is a fundamental operation in data manipulation.
Whether your data is well-structured or contains missing values, Pandas provides the tools to convert your list of dictionaries into a structured DataFrame effortlessly.