6 pathlib: Manipulate File Paths
6.1 Why pathlib
Traditional string based file path manipulations can be error-prone and cumbersome.
import os
import pandas as pd
# Works on Windows, but breaks on Mac/Linux
path = "data\\raw\\report.csv"
df = pd.read_csv(path)For example, the above code uses backslashes (\) which are specific to Windows file paths. This code will break on Mac or Linux systems that use forward slashes (/) instead. You could use os.path.join() to make it platform-independent, but it can get messy with complex paths. You could also use forward slashes, which work on both Windows and Mac/Linux, but it can be less readable for those accustomed to their platform’s conventions. The bottom line is that string based paths can lead to bugs and maintenance challenges.
The pathlib module in Python provides an object-oriented approach to handle filesystem paths, making it easier to read, write, and manipulate paths in a platform-independent way.
6.2 Using pathlib
To get started with pathlib, you need to import the Path class from the pathlib module. The above code can be rewritten using pathlib as follows:
from pathlib import Path
import pandas as pd
path = Path("data") / "raw" / "report.csv"
df = pd.read_csv(path)In this example, we create a Path object representing the file path. The / operator is overloaded in the Path class to join path components in a platform-independent way. This makes the code more readable and less error-prone.
Here is another example. Suppose we have a notes folder with many .txt files and want to
- Change their extension to
.md - Prefix the filename with “archived_”
from pathlib import Path
folder = Path("notes")
for path in folder.glob("*.txt"):
new_path = path.with_name(f"archived_{path.stem}").with_suffix(".md")
path.replace(new_path)
print(f"Renamed {path} to {new_path}")The code above does the following:
folder.glob()finds all.txtfiles in thenotesdirectorypath.stemgives filename without the extension (reliable)with_name()safely constructs a new filenamewith_suffix()changes the file extensionpath.replace()moves/renames cleanly
The code is easy to read and works on any OS without changes.
6.3 Learning Resources
pathlib has many practical use cases. To learn more about its features and capabilities, the following resources may be helpful:
- Python’s pathlib Module: Taming the File System: A comprehensive tutorial on using
pathlibfor filesystem path manipulations. - Pathlib Documentation: The official documentation for the
pathlibmodule, providing detailed information on its classes and methods.