mooonpy.tools.file_utils module
- class mooonpy.tools.file_utils.Path(string: str | Path)[source]
Bases:
strAs computational scientists, half our jobs is file management and manipulation, the Path class contains several aliases for the os.path and glob.glob modules to make processing data easier. All mooonpy functions internally use this class for inputs of files or folders. Relevant strings are converted to path on entering functions
Examples
A copy of the code used in these examples is avalible in rootmooonpyexamplestoolspath_utilsexample_Path.py
- Basic Path Operations
>>> project_path = Path('Project/Data/Analysis') >>> filename = Path('results.txt') >>> full_path = project_path / filename >>> print(full_path) Project\Data\Analysis\results.txt >>> print(abs(full_path)) root\mooonpy\examples\tools\path_utils\Project\Data\Analysis\results.txt
- Path Parsing
>>> sample_path = Path('experiments/run_001/data.csv.gz') >>> print(sample_path.dir()) experiments\run_001 >>> print(sample_path.basename()) data.csv.gz >>> print(sample_path.root()) data.csv >>> print(sample_path.ext()) .gz
- Extension Manipulation
>>> data_file = Path('analysis/results.txt') >>> print(data_file.new_ext('.json')) analysis\results.json >>> print(data_file.new_ext('.txt.gz')) analysis\results.txt.gz
- File Existence
>>> current_file = Path(__file__) >>> fake_file = Path('nonexistent.txt') >>> print(bool(current_file)) True >>> print(bool(fake_file)) False
- Wildcard Matching
>>> txt_pattern = Path('temp_dir/*.txt') >>> print(txt_pattern.matches()) ['test1.txt', 'test2.txt'] >>> for file in Path('temp_dir/*'): ... print(file.basename()) data.csv readme.md test1.txt test2.txt
- Recent File Finding
>>> pattern = Path('temp_dir/*.txt') >>> print(pattern.recent()) newest_file.txt >>> print(pattern.recent(oldest=True)) old_file.txt
- Smart File Opening
>>> mypath = Path('data.txt') >>> with mypath.open('w') as f: ... f.write('Hello World') # Creates regular file >>> compressed_path = Path('data.txt.gz') >>> # compressed_path.open() would use gzip automatically # Would automatically handle gzip compression ** Absolute Path Conversion ** >>> rel_path = Path('data/file.txt') >>> print(abs(rel_path)) root\mooonpy\examples\tools\path_utils\data\file.txt
Todo
__truediv__ __bool__ __abs__ and __iter__ docstrings in config?
- basename() Path[source]
Split Path to filename and extention.
Alias for os.path.basename
- Returns:
Path of file
- Return type:
- Example:
>>> from mooonpy import Path >>> MyPath = Path('Project/Monomers/DETDA.mol') >>> print(MyPath.basename()) 'DETDA.mol'
- dir() Path[source]
Split Path to directory.
Alias for os.path.dirname.
- Returns:
Path to directory
- Return type:
- Example:
>>> from mooonpy import Path >>> MyPath = Path('Project/Monomers/DETDA.mol') >>> print(MyPath.dir()) 'Project\Monomers'
- ext() Path[source]
Split Path to just extention.
Alias for os.path.basename and splitext.
- Returns:
extention as Path
- Return type:
- Example:
>>> from mooonpy import Path >>> MyPath = Path('Project/Monomers/DETDA.mol') >>> print(MyPath.ext()) '.mol'
- classmethod find_prefix(common=None, path=None, add=True) Path | None[source]
Extract the prefix of path up to and including common, and optionally append it to
search_prefixes.When path is omitted the caller’s
__file__is used automatically, so a script located inside the common directory tree only needs to supply the common directory name. When common is omitted the already-storedsearch_commonis reused, allowing multiple calls for different drives without repeating the directory name.- Parameters:
common (str or None) – Shared directory name, e.g.
'research'. Omit to reusesearch_common(must have been set earlier).path (str, Path, or None) – Path containing common as a component. Omit to use the calling script’s
__file__automatically.add (bool) – Append the extracted prefix to
search_prefixes(defaultTrue). Duplicates are skipped.
- Returns:
Extracted prefix
Path, orNoneif common was not found in path.- Return type:
Path or None
- Example:
>>> # In a script at C:/research/sims/run_001/analysis.py: >>> Path.find_prefix('research') # auto-detects C:\research Path('C:\\research') >>> Path.find_prefix(path='D:/backups/research/sims/run_001/analysis.py') Path('D:\\backups\\research') # reuses search_common='research'
- format(*args, **kwargs) str[source]
Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
- locate(recent=None) Path | None[source]
Locate this path across
search_prefixes.Behaviour depends on recent and whether the path contains a
*wildcard:recent=None(default) — prefix-order priorityNo wildcard: return the first prefix for which the exact file exists.
Wildcard: return the glob pattern (
*kept in place) for the first prefix that has any matches. Pass that result tomatches()or iterate over it to expand the files.
recent=True— return the most recently modified file across all prefixes (wildcards expanded vialocate_all()).recent=False— return the oldest file across all prefixes.
Naming note:
locateavoids shadowing the built-instr.find()method.- Parameters:
recent –
Nonefor prefix-order (default),Truefor newest,Falsefor oldest.- Returns:
Matched
Path, orNone.- Return type:
Path or None
- Example:
>>> Path.search_prefixes = [Path('C:/research'), Path('D:/backups/research')] >>> Path.search_common = 'research' >>> Path('sims/run_001/output.log').locate() Path('C:\\research\\sims\\run_001\\output.log') >>> Path('sims/*/output.log').locate() # returns the pattern Path('C:\\research\\sims\\*\\output.log') >>> Path('sims/*/output.log').locate(recent=True) Path('D:\\backups\\research\\sims\\run_002\\output.log') >>> Path('sims/*/output.log').locate(recent=False) Path('C:\\research\\sims\\run_001\\output.log')
- locate_all(whitelist_ext=None, blacklist_ext=None) List[Path][source]
Return all glob matches of this path across every prefix in
search_prefixes.Wildcards are expanded on each prefix independently, so
Path('sims/*/output.log').locate_all()collects every matching file across all configured drives.- Parameters:
whitelist_ext – If given, only include paths with these extensions.
blacklist_ext – If given, exclude paths with these extensions.
- Returns:
All matching
Pathobjects across all prefixes.- Return type:
List[Path]
- Example:
>>> Path.search_prefixes = [Path('C:/research'), Path('D:/backups/research')] >>> Path.search_common = 'research' >>> Path('sims/*/output.log').locate_all() [Path('C:\\research\\sims\\run_001\\output.log'), Path('D:\\backups\\research\\sims\\run_001\\output.log'), Path('D:\\backups\\research\\sims\\run_002\\output.log')]
- matches(whitelist_ext=None, blacklist_ext=None) List[Path][source]
Finds matching paths with a * (asterisk) wildcard character.
- Returns:
List of matching Paths
- Return type:
List[Path]
- Example:
>>> from mooonpy import Path >>> MyWildcard = Path('*.mol') >>> print(Path.matches(MyWildcard)) [Path('DETDA.mol'), Path('DEGBF.mol')]
- open(mode='r', encoding='utf-8')[source]
Open path with smart_open
- Parameters:
mode (str) – Open mode, usually ‘r’ or ‘a’
encoding (str) – File encoding
- Returns:
opened file as object
- Return type:
File Object
- Example:
>>> from mooonpy import Path >>> MyPath = Path('Project/Monomers/DETDA.mol') >>> MyFileObj = MyPath.open(mode='r')
- recent(oldest: bool = False) Path | None[source]
Find wildcard matches and return the Path of the most recently modified file.
- Parameters:
oldest (bool) – Reverses direction and finds least recently modified file.
- Returns:
Path of most recently modified file
- Return type:
- Example:
>>> from mooonpy import Path >>> MyWildcard = Path('Template_*.lmpmol') >>> print(Path.recent()) 'Template_1_v10_final_realthistime.lmpmol' >>> print(Path.recent(oldest=True)) 'Template_1.lmpmol'
- root() Path[source]
Split Path to filename with no extention.
Alias for os.path.basename and splitext.
- Returns:
Path of filename
- Return type:
- Example:
>>> from mooonpy import Path >>> MyPath = Path('Project/Monomers/DETDA.mol') >>> print(MyPath.root()) 'DETDA'
- search_common = None
- search_prefixes = None
- swap_prefix(target) Path | None[source]
Return a new path with this path’s prefix replaced by target.
Strips everything up to and including
search_commonfrom self (via_relpath_from_common()), then prepends the chosen prefix.- Parameters:
target (int, str, or Path) – Replacement prefix — either an
intindex intosearch_prefixes, or astr/Pathvalue.- Returns:
Rewritten
Path, orNoneif target is an out-of-range integer index.- Return type:
Path or None
- Example:
>>> Path.search_prefixes = [Path('C:/research'), Path('D:/backups/research')] >>> Path.search_common = 'research' >>> p = Path('D:/backups/research/sims/run_001/output.log') >>> p.swap_prefix(0) Path('C:\\research\\sims\\run_001\\output.log') >>> p.swap_prefix('D:/backups/research') Path('D:\\backups\\research\\sims\\run_001\\output.log')
- mooonpy.tools.file_utils.smart_open(filename, mode='r', encoding='utf-8')[source]
Open file with appropriate decompression based on extension
- Supported extensions: Use substring in filename
.gz: Uses gzip module
.bz2: Uses bzip2 module
.xz: Uses lzma module
.lzma: Uses lzma module
Other extensions use the builtin open function
- Parameters:
filename (Path or str) – Path to file
mode (str) – Open mode, usually ‘r’, ‘w’ or ‘a’
encoding (str) – File encoding
- Returns:
opened file as object
- Return type:
File Object
- Example:
>>> from mooonpy.tools.file_utils import smart_open >>> MyFileObj = smart_open('Project/Monomers/DETDA.data.gz')