mooonpy.tools.file_utils module

class mooonpy.tools.file_utils.Path(string: str | Path)[source]

Bases: str

As computational scientists, half our jobs is file management and manipulation, the Path class contains several aliases for the os.path and glob.glob modules to make processing data easier. All mooonpy functions internally use this class for inputs of files or folders. Relevant strings are converted to path on entering functions

Examples

A copy of the code used in these examples is avalible in rootmooonpyexamplestoolspath_utilsexample_Path.py

Basic Path Operations
>>> project_path = Path('Project/Data/Analysis')
>>> filename = Path('results.txt')
>>> full_path = project_path / filename
>>> print(full_path)
Project\Data\Analysis\results.txt
>>> print(abs(full_path))
root\mooonpy\examples\tools\path_utils\Project\Data\Analysis\results.txt
Path Parsing
>>> sample_path = Path('experiments/run_001/data.csv.gz')
>>> print(sample_path.dir())
experiments\run_001
>>> print(sample_path.basename())
data.csv.gz
>>> print(sample_path.root())
data.csv
>>> print(sample_path.ext())
.gz
Extension Manipulation
>>> data_file = Path('analysis/results.txt')
>>> print(data_file.new_ext('.json'))
analysis\results.json
>>> print(data_file.new_ext('.txt.gz'))
analysis\results.txt.gz
File Existence
>>> current_file = Path(__file__)
>>> fake_file = Path('nonexistent.txt')
>>> print(bool(current_file))
True
>>> print(bool(fake_file))
False
Wildcard Matching
>>> txt_pattern = Path('temp_dir/*.txt')
>>> print(txt_pattern.matches())
['test1.txt', 'test2.txt']
>>> for file in Path('temp_dir/*'):
...     print(file.basename())
data.csv
readme.md
test1.txt
test2.txt
Recent File Finding
>>> pattern = Path('temp_dir/*.txt')
>>> print(pattern.recent())
newest_file.txt
>>> print(pattern.recent(oldest=True))
old_file.txt
Smart File Opening
>>> mypath = Path('data.txt')
>>> with mypath.open('w') as f:
...     f.write('Hello World')
# Creates regular file
>>> compressed_path = Path('data.txt.gz')
>>> # compressed_path.open() would use gzip automatically
# Would automatically handle gzip compression
** Absolute Path Conversion **
>>> rel_path = Path('data/file.txt')
>>> print(abs(rel_path))
root\mooonpy\examples\tools\path_utils\data\file.txt

Todo

__truediv__ __bool__ __abs__ and __iter__ docstrings in config?

basename() Path[source]

Split Path to filename and extention.

Alias for os.path.basename

Returns:

Path of file

Return type:

Path

Example:
>>> from mooonpy import Path
>>> MyPath = Path('Project/Monomers/DETDA.mol')
>>> print(MyPath.basename())
'DETDA.mol'
dir() Path[source]

Split Path to directory.

Alias for os.path.dirname.

Returns:

Path to directory

Return type:

Path

Example:
>>> from mooonpy import Path
>>> MyPath = Path('Project/Monomers/DETDA.mol')
>>> print(MyPath.dir())
'Project\Monomers'
ext() Path[source]

Split Path to just extention.

Alias for os.path.basename and splitext.

Returns:

extention as Path

Return type:

Path

Example:
>>> from mooonpy import Path
>>> MyPath = Path('Project/Monomers/DETDA.mol')
>>> print(MyPath.ext())
'.mol'
classmethod find_prefix(common=None, path=None, add=True) Path | None[source]

Extract the prefix of path up to and including common, and optionally append it to search_prefixes.

When path is omitted the caller’s __file__ is used automatically, so a script located inside the common directory tree only needs to supply the common directory name. When common is omitted the already-stored search_common is reused, allowing multiple calls for different drives without repeating the directory name.

Parameters:
  • common (str or None) – Shared directory name, e.g. 'research'. Omit to reuse search_common (must have been set earlier).

  • path (str, Path, or None) – Path containing common as a component. Omit to use the calling script’s __file__ automatically.

  • add (bool) – Append the extracted prefix to search_prefixes (default True). Duplicates are skipped.

Returns:

Extracted prefix Path, or None if common was not found in path.

Return type:

Path or None

Example:
>>> # In a script at C:/research/sims/run_001/analysis.py:
>>> Path.find_prefix('research')           # auto-detects C:\research
Path('C:\\research')
>>> Path.find_prefix(path='D:/backups/research/sims/run_001/analysis.py')
Path('D:\\backups\\research')          # reuses search_common='research'
format(*args, **kwargs) str[source]

Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

locate(recent=None) Path | None[source]

Locate this path across search_prefixes.

Behaviour depends on recent and whether the path contains a * wildcard:

  • recent=None (default)prefix-order priority

    • No wildcard: return the first prefix for which the exact file exists.

    • Wildcard: return the glob pattern (* kept in place) for the first prefix that has any matches. Pass that result to matches() or iterate over it to expand the files.

  • recent=True — return the most recently modified file across all prefixes (wildcards expanded via locate_all()).

  • recent=False — return the oldest file across all prefixes.

Naming note: locate avoids shadowing the built-in str.find() method.

Parameters:

recentNone for prefix-order (default), True for newest, False for oldest.

Returns:

Matched Path, or None.

Return type:

Path or None

Example:
>>> Path.search_prefixes = [Path('C:/research'), Path('D:/backups/research')]
>>> Path.search_common   = 'research'
>>> Path('sims/run_001/output.log').locate()
Path('C:\\research\\sims\\run_001\\output.log')
>>> Path('sims/*/output.log').locate()          # returns the pattern
Path('C:\\research\\sims\\*\\output.log')
>>> Path('sims/*/output.log').locate(recent=True)
Path('D:\\backups\\research\\sims\\run_002\\output.log')
>>> Path('sims/*/output.log').locate(recent=False)
Path('C:\\research\\sims\\run_001\\output.log')
locate_all(whitelist_ext=None, blacklist_ext=None) List[Path][source]

Return all glob matches of this path across every prefix in search_prefixes.

Wildcards are expanded on each prefix independently, so Path('sims/*/output.log').locate_all() collects every matching file across all configured drives.

Parameters:
  • whitelist_ext – If given, only include paths with these extensions.

  • blacklist_ext – If given, exclude paths with these extensions.

Returns:

All matching Path objects across all prefixes.

Return type:

List[Path]

Example:
>>> Path.search_prefixes = [Path('C:/research'), Path('D:/backups/research')]
>>> Path.search_common   = 'research'
>>> Path('sims/*/output.log').locate_all()
[Path('C:\\research\\sims\\run_001\\output.log'),
 Path('D:\\backups\\research\\sims\\run_001\\output.log'),
 Path('D:\\backups\\research\\sims\\run_002\\output.log')]
matches(whitelist_ext=None, blacklist_ext=None) List[Path][source]

Finds matching paths with a * (asterisk) wildcard character.

Returns:

List of matching Paths

Return type:

List[Path]

Example:
>>> from mooonpy import Path
>>> MyWildcard = Path('*.mol')
>>> print(Path.matches(MyWildcard))
[Path('DETDA.mol'), Path('DEGBF.mol')]
new_ext(ext: str | Path) Path[source]

Replace extension on a Path with a new extension.

Parameters:

ext (str or Path) – new extension including delimeter.

Returns:

replaced Path

Return type:

Path

Example:
>>> from mooonpy import Path
>>> MyPath = Path('Project/Monomers/DETDA.mol')
>>> print(MyPath.new_ext('.data'))
'Project/Monomers/DETDA.data'
open(mode='r', encoding='utf-8')[source]

Open path with smart_open

Parameters:
  • mode (str) – Open mode, usually ‘r’ or ‘a’

  • encoding (str) – File encoding

Returns:

opened file as object

Return type:

File Object

Example:
>>> from mooonpy import Path
>>> MyPath = Path('Project/Monomers/DETDA.mol')
>>> MyFileObj = MyPath.open(mode='r')
recent(oldest: bool = False) Path | None[source]

Find wildcard matches and return the Path of the most recently modified file.

Parameters:

oldest (bool) – Reverses direction and finds least recently modified file.

Returns:

Path of most recently modified file

Return type:

Path

Example:
>>> from mooonpy import Path
>>> MyWildcard = Path('Template_*.lmpmol')
>>> print(Path.recent())
'Template_1_v10_final_realthistime.lmpmol'
>>> print(Path.recent(oldest=True))
'Template_1.lmpmol'
root() Path[source]

Split Path to filename with no extention.

Alias for os.path.basename and splitext.

Returns:

Path of filename

Return type:

Path

Example:
>>> from mooonpy import Path
>>> MyPath = Path('Project/Monomers/DETDA.mol')
>>> print(MyPath.root())
'DETDA'
search_common = None
search_prefixes = None
swap_prefix(target) Path | None[source]

Return a new path with this path’s prefix replaced by target.

Strips everything up to and including search_common from self (via _relpath_from_common()), then prepends the chosen prefix.

Parameters:

target (int, str, or Path) – Replacement prefix — either an int index into search_prefixes, or a str/Path value.

Returns:

Rewritten Path, or None if target is an out-of-range integer index.

Return type:

Path or None

Example:
>>> Path.search_prefixes = [Path('C:/research'), Path('D:/backups/research')]
>>> Path.search_common   = 'research'
>>> p = Path('D:/backups/research/sims/run_001/output.log')
>>> p.swap_prefix(0)
Path('C:\\research\\sims\\run_001\\output.log')
>>> p.swap_prefix('D:/backups/research')
Path('D:\\backups\\research\\sims\\run_001\\output.log')
mooonpy.tools.file_utils.smart_open(filename, mode='r', encoding='utf-8')[source]

Open file with appropriate decompression based on extension

Supported extensions: Use substring in filename
  • .gz: Uses gzip module

  • .bz2: Uses bzip2 module

  • .xz: Uses lzma module

  • .lzma: Uses lzma module

  • Other extensions use the builtin open function

Parameters:
  • filename (Path or str) – Path to file

  • mode (str) – Open mode, usually ‘r’, ‘w’ or ‘a’

  • encoding (str) – File encoding

Returns:

opened file as object

Return type:

File Object

Example:
>>> from mooonpy.tools.file_utils import smart_open
>>> MyFileObj = smart_open('Project/Monomers/DETDA.data.gz')