Read TSV File in Python
TSV (Tab Separated Values) are a type of file format that stores data in tabular form. It is similar to CSV (Comma Separated Values) file format but instead of comma, it uses tab as a delimiter.
In this tutorial, you will learn various ways to load and read TSV files in Python.
- Using csv module
- Using Pandas Library
- Using Built-in Functions
- Speed Comparison
Table of Contents
For testing purpose we are going to use the following data.
Name Age Occupation
John 32 Engineer
Emily 28 Teacher
Michael 42 Doctor
Sarah 35 Lawyer
David 39 Architect
1. Using csv module
The csv module in Python provides functionalities to work with CSV and TSV files.
To read a TSV file using the csv module, open the file using open() function and pass the file object to csv.reader() function with '\t' as the delimiter.
Here is the code to read the above TSV file using csv module.
import csv
# Path to the TSV file
tsv_file = 'data.tsv'
# Open the TSV file using 'csv.reader' with tab delimiter
with open(tsv_file, 'r') as file:
tsv_reader = csv.reader(file, delimiter='\t')
for row in tsv_reader:
print(row)
Output:
['Name Age Occupation'] ['John 32 Engineer'] ['Emily 28 Teacher'] ['Michael 42 Doctor'] ['Sarah 35 Lawyer'] ['David 39 Architect']
2. Using Pandas Library
Pandas is a Python library that provides high-performance, easy-to-use data structures and data analysis tools.
To read a TSV file using Pandas, use read_csv() function and pass the file path as an argument.
Here is the code to read the above TSV file using Pandas.
import pandas as pd
# Path to the TSV file
tsv_file = 'data.tsv'
# Read the TSV file into a DataFrame using Pandas
data = pd.read_csv(tsv_file, sep='\t')
print(data)
Output:
Name Age Occupation 0 John 32 Engineer 1 Emily 28 Teacher 2 Michael 42 Doctor 3 Sarah 35 Lawyer 4 David 39 Architect
3. Using Built-in Functions
In Python we have built-in functions like open() and strip() that can be used to read a TSV file.
For this open the file using open() function and use strip() function to remove the trailing newline character from each line.
Then split each line using split() function with '\t' as the delimiter.
The following code shows how to do it.
# Path to the TSV file
tsv_file = 'data.tsv'
# Open the TSV file using 'open' function
with open(tsv_file, 'r') as file:
# Iterate over each line
for line in file:
# Remove the trailing newline character
line = line.strip()
# Split each line using 'split' function
line = line.split('\t')
# read the data
print(line)
Output:
['Name Age Occupation'] ['John 32 Engineer'] ['Emily 28 Teacher'] ['Michael 42 Doctor'] ['Sarah 35 Lawyer'] ['David 39 Architect']
Speed Comparison
Reading TSV files in Python can be accomplished using various methods. Here we have compared the speed of each method and plotted a graph.
As you can see from the graph, the fastest method to read TSV files in Python is Method 1 i.e. using csv module.
Happy coding! 😊