Python Program to Find All File with .txt Extension Present Inside a Directory

Spread the love

Creating a Python program to find all files with a .txt extension within a directory is an essential task, especially for developers dealing with file and directory manipulation, data analysis, or information retrieval. Below, we will explore the concept extensively, delving into multiple ways to achieve this, dealing with nested directories, handling errors and exceptions, and optimizing the process.

Basic Method: Using os and fnmatch

Python’s os and fnmatch libraries can be utilized to traverse directories and match file names to a certain pattern.

import os
import fnmatch

directory = 'sample_directory'

try:
    txt_files = [f for f in os.listdir(directory) if fnmatch.fnmatch(f, '*.txt')]
except FileNotFoundError:
    print(f"{directory} not found!")
else:
    print(f".txt files in {directory}: {txt_files}")

In this method, we use os.listdir to list all files in the specified directory and fnmatch.fnmatch to filter out .txt files.

Dealing with Nested Directories: Using os.walk

If there are nested directories and we want to find .txt files in all of them, os.walk is a handy method.

import os

directory = 'sample_directory'

try:
    txt_files = [os.path.join(root, f) for root, dirs, files in os.walk(directory) for f in files if f.endswith('.txt')]
except Exception as e:
    print(f"An error occurred: {str(e)}")
else:
    print(f".txt files in {directory} and its subdirectories: {txt_files}")

This approach recursively traverses through all the subdirectories and fetches .txt files from each one.

Using glob Module

Python’s glob module can be also used to find all .txt files within a directory, making the task simpler.

import glob

directory = 'sample_directory'

try:
    txt_files = glob.glob(f"{directory}/**/*.txt", recursive=True)
except Exception as e:
    print(f"An error occurred: {str(e)}")
else:
    print(f".txt files in {directory}: {txt_files}")

The glob.glob function with recursive=True can traverse through nested directories as well.

Handling Errors and Exceptions

When working with directories, it’s crucial to handle exceptions that can arise due to directory not found, permission denied, etc.

import os

directory = 'non_existent_directory'

try:
    txt_files = [f for f in os.listdir(directory) if f.endswith('.txt')]
except FileNotFoundError:
    print(f"{directory} not found!")
except PermissionError:
    print(f"Permission denied to read {directory}!")
except Exception as e:
    print(f"An unexpected error occurred: {str(e)}")

Creating a Function for Reusability

For more modular and clean code, you can create a function to find .txt files, which can be reused with different directories.

import os

def find_txt_files(directory: str):
    try:
        return [f for f in os.listdir(directory) if f.endswith('.txt')]
    except FileNotFoundError:
        print(f"{directory} not found!")
        return []
    except Exception as e:
        print(f"An unexpected error occurred: {str(e)}")
        return []

directory = 'sample_directory'
txt_files = find_txt_files(directory)
print(f".txt files in {directory}: {txt_files}")

Optimizing Performance for Large Directories

When dealing with large directories with numerous files and subdirectories, it’s crucial to optimize the code to avoid performance bottlenecks.

Using Generator Expressions:

Using generator expressions instead of list comprehensions can be more memory-efficient.

import os

directory = 'large_directory'

try:
    txt_files = (f for f in os.listdir(directory) if f.endswith('.txt'))
except FileNotFoundError:
    print(f"{directory} not found!")
else:
    for file in txt_files:
        print(file)

Avoiding Unnecessary Operations:

Minimizing the operations inside loops and avoiding unnecessary calculations and checks can enhance the performance.

Advanced Usage: Incorporating Regular Expressions

For more complex file matching patterns, Python’s re module can be integrated.

import os
import re

directory = 'sample_directory'
pattern = re.compile(r'^[a-z]+\.txt$', re.IGNORECASE)

try:
    txt_files = [f for f in os.listdir(directory) if pattern.match(f)]
except FileNotFoundError:
    print(f"{directory} not found!")
else:
    print(f"Matching .txt files in {directory}: {txt_files}")

Here, re.compile is used to compile a regular expression pattern, and pattern.match is used to match file names to the pattern.

Conclusion

Finding all .txt files in a directory using Python can be accomplished using various methods and libraries like os, fnmatch, glob, and re, each serving different needs and complexities. Whether dealing with simple directory structures or nested ones, Python provides the flexibility and the tools to traverse and match files efficiently.

Implementing error handling is pivotal to manage unexpected situations and ensure the robustness of the program. Creating reusable functions and optimizing the performance are also crucial, especially when dealing with large and complex directory structures.

Leave a Reply