10 Ways to Create Pandas DataFrame
One of the most amazing tools for data manipulation in Python is the Pandas library. But what makes Pandas so powerful? π€
It is the DataFrame, a highly versatile and robust data structure that serves as the backbone of managing and arranging data.
The main objective of this article is to explore deeply about Pandas DataFrame comparing it with other types of data structures, giving practical examples, and covering various ways one can construct DataFrame.
- Understanding DataFrame
- DataFrame vs. Other Data Objects
- Example of a Pandas DataFrame
-
Creating Pandas DataFrame
- Creating Empty DataFrame
- Creating DataFrame from Dictionary
- Creating DataFrame from List of Lists
- Creating DataFrame from List of Dictionaries
- Creating DataFrame using zip()
- Creating Dataframe from Dictionary of Series
- Creating from Dictionary of Series
- Creating DataFrame with Custom Index
- Creating DataFrame from CSV File
- Creating DataFrame from Excel File
- Creating DataFrame from JSON File
- Conclusion
Table of Contents
Understanding DataFrame
A Pandas DataFrame is a two-dimensional data structure like a table with rows and columns.
You can compare it to a spreadsheet or SQL table, where data is organized in rows and columns, and each column can have a specific data type. This structure allows for easy manipulation, analysis, and cleaning of data.
DataFrame vs. Other Data Objects
- List of Lists: While lists of lists can represent tabular data, Pandas DataFrames provide more functionality, indexing options, and optimized operations for data analysis.
- NumPy Arrays: NumPy arrays are powerful for numerical operations, but DataFrames excel in handling heterogeneous data types and providing labeled axes for easy referencing.
- Dictionaries: Dictionaries can store tabular data, but DataFrames offer a more structured and efficient approach with built-in methods for data manipulation.
Example of a Pandas DataFrame
pandas.DataFrame() function is used to create a DataFrame in Pandas.
Syntax:
pandas.DataFrame(data=None, index=None, columns=None)
Here,
- data is the data to be stored in the DataFrame.
- index is the row labels to be used for the DataFrame.
- columns is the column labels to be used for the DataFrame.
import pandas as pd
# sample data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
# Creating a DataFrame from a dictionary
df = pd.DataFrame(data)
print(df)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
Creating Pandas DataFrame
You can create from almost whatever data structure you have, be it a dictionary, list, NumPy array, or another DataFrame.
Whatever data you have find below the ways to create a DataFrame from it.
1. Creating Empty DataFrame
To create a dataframe we use pd.DataFrame() function and pass the data to it.
We can pass a list of column names to the columns parameter and it will create an empty DataFrame with the given column names. You can also even choose to pass nothing then it will create an empty DataFrame with no columns.
import pandas as pd
# π Creating an empty DataFrame
df = pd.DataFrame(columns=['Name', 'Age', 'City'])
print(df)
Output:
Empty DataFrame Columns: [Name, Age, City] Index: []
Also, learn how to check if a dataframe is empty.
2. Creating DataFrame from Dictionary
One of the most common ways to create a DataFrame is from a dictionary.
Each key of the dictionary represents a column name and the corresponding value is a list of column values.
import pandas as pd
# sample data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
# π Creating a DataFrame from a dictionary
df = pd.DataFrame(data)
print(df)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
3. Creating DataFrame from List of Lists
Another way to create a DataFrame is from a list of lists. Here each inner list represents a row and the outer list represents the whole DataFrame.
It is important to note that all the inner lists must be of the same length.
For this, you have to explicitly create column names.
import pandas as pd
# sample data
data = [['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']]
# π Creating a DataFrame from a list of lists
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
4. Creating DataFrame from List of Dictionaries
The data you have may be in the form of a list of dictionaries. Here each dictionary represents a row and the keys of the dictionary represent the column names.
import pandas as pd
# sample data
data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
{'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'},
{'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}]
# π Creating a DataFrame from a list of dictionaries
df = pd.DataFrame(data)
print(df)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
5. Creating DataFrame using zip()
In the name of sample data suppose you have 2 lists and you have to create dataframe from it.
Combine these lists using the zip() function, make a list of tuples, and pass it to the pd.DataFrame() function.
import pandas as pd
# sample data
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]
# list of tuples
data = list(zip(names, ages))
# π Creating a DataFrame using zip()
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)
Output:
Name Age 0 Alice 25 1 Bob 30 2 Charlie 35
6. Creating Dataframe from Dictionary of Series
A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).
Let's see how to create a DataFrame from a dictionary of Series.
import pandas as pd
name_series = pd.Series(['Alice', 'Bob', 'Charlie'])
age_series = pd.Series([25, 30, 35])
city_series = pd.Series(['New York', 'San Francisco', 'Los Angeles'])
# sample data
data = {
'name': name_series,
'age': age_series,
'city': city_series
}
# π Creating a DataFrame from a dictionary of Series
df = pd.DataFrame(data)
print(df)
Output:
name age city 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
7. Creating DataFrame with Custom Index
By default, Pandas DataFrame has a numeric index starting from 0. But you can also create a DataFrame with a custom index, like "a", "b", "c", etc.
Let's create a dataframe with custom index.
import pandas as pd
# sample data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
# custom index
my_index = ['a', 'b', 'c']
# π Creating a DataFrame with custom index
df = pd.DataFrame(data, index=my_index)
print(df)
Output:
Name Age City a Alice 25 New York b Bob 30 San Francisco c Charlie 35 Los Angeles
8. Creating DataFrame from CSV File
The big data you will work with will generally be in the form of CSV files. So, it is important to know how to create a DataFrame from a CSV file.
For this, you can apply the read_csv() function of Pandas and pass the path of the CSV file to it.
It will create a DataFrame from the CSV file.
import pandas as pd
# π Creating a DataFrame from a CSV file
df = pd.read_csv('./data.csv')
print(df)
9. Creating DataFrame from Excel File
Like CSV files, you can create a DataFrame from an Excel file.
For this, you can apply the read_excel() function of Pandas and pass the path of the Excel file to it.
For testing you can convert your CSV file to an Excel file using the online tool.
import pandas as pd
# π Creating a DataFrame from an Excel file
df = pd.read_excel('./data.xlsx')
print(pd)
10. Creating DataFrame from JSON File
JSON files are also a popular way to store data. To convert JSON data into Pandas DataFrame use the read_json() function.
import pandas as pd
# π Creating a DataFrame from a JSON file
df = pd.read_json('./data.json')
print(df)
Conclusion
So, by now you have learned about DataFrame and how to create it from any possible data given to you.
This guide provides a comprehensive overview, empowering you to leverage the full potential of Pandas in your data-centric projects. To learn more about Pandas check the sidebar.
Happy Pythoning! π