Python Loop Through Files in Directory
When working with file operations in Python, it is most common task to access files in a directory. This task can be accomplished using various methods provided by the Python programming language.
In this article, we will explore multiple approaches to loop through files in a directory using Python.
- Using os module
- Using glob module
- Using os.walk() method
- Using pathlib module
- Speed Comparison
Table of Contents
For this article, we will be using a directory named test which contains 5 files.
test
├── file1.txt
├── file2.py
├── file3.js
├── file4.java
└── file5.cpp
1. Using os module
Python's os module provides a simple but effective way to access files in a directory. The os.listdir() method returns a list of all the files and directories in the specified path.
From these files and directories, we can filter out the files using os.path.isfile() method and then loop through them.
Here is how you can loop through files in a directory using os module.
import os
# path to the directory
path = "/test"
# iterate over all the files in the directory
for filename in os.listdir(path):
# check whether the file is a file or directory
if os.path.isfile(os.path.join(path, filename)):
print(filename)
Output:
file1.txt file2.py file3.js file4.java file5.cpp
Here, we accessed all the files in the directory using os.listdir() method and then filtered out the files using os.path.isfile() method by passing the path of the file as an argument.
2. Using glob module
The glob() method in Python's glob module returns a list of all the files and directories in the specified path.
To select all the files in a directory, you can pass * as an argument with the path.
Here is an example for this.
import glob
# path to the directory
path = "/test"
# iterate over all the files in the directory
for filename in glob.glob(path + "/*"):
print(filename)
Output:
test/file1.txt test/file2.py test/file3.js test/file4.java test/file5.cpp
3. Using os.walk() method
The os.walk() method from os module returns a generator object which can be used to iterate over all the files and directories in a directory.
The following example shows the use of os.walk() method.
import os
# path to the directory
path = "/test"
# iterate over all the files in the directory
for root, dirs, files in os.walk(path):
for filename in files:
print(filename)
Output:
file1.txt file2.py file3.js file4.java file5.cpp
4. Using pathlib module
The pathlib module introduced in Python 3 provides an object-oriented approach to file system operations. The Path() class can be used to iterate through files in a directory.
Let's take a look:
from pathlib import Path
directory = Path('/test')
# iterate over all the files in the directory
for file_path in directory.iterdir():
if file_path.is_file():
print(file_path)
Output:
test/file1.txt test/file2.py test/file3.js test/file4.java test/file5.cpp
In this method, we create a Path object representing the directory path. We then use the iterdir() method to iterate over all items (files and directories) in the directory. By checking is_file(), we can filter out directories and focus on files for further processing.
Speed Comparison
After running all of above methods on a directory containing 5 files for 1000 times, we have plotted the results in the following graph.
From the graphp we can clearly see that Method 3, i.e. os.walk() method is the fastest method to loop through files in a directory.