simple_dvc.util_fsspec module¶
fsspec wrappers that should make working with S3 / the local file system seemless.
Todo
Someone must have already implemented this somewhere. Find that to either use directly or as a reference.
Note
While under development needs to be synced between
~/code/simple_dvc/simple_dvc/util_fsspec.py
AND
~/code/watch/watch/utils/util_fsspec.py
Might move to kwutil later
- class simple_dvc.util_fsspec.FSPath(path, *, fs=None)[source]¶
Bases:
strProvide a pathlib.Path-like way of interacting with fsspec.
This has a few notable differences with pathlib.Path. We inherit from
strbecausepathlib.Pathsemantics can break protocols sections of URIs. This means we have to useos.pathfunctions to implement things likeFSPath.relative_to()andFSPath.joinpath()(which behave differently than pathlib)Note
Not all of the fsspec / pathlib operations are currently implemented, add as needed.
Example
>>> cwd = FSPath.coerce('.') >>> print(cwd) >>> print(cwd.fs)
- classmethod _current_fs(**kwargs)[source]¶
The “default” FileSystem object. Get the most recent filesystem with this protocol, or create a new one with defaults.
- Returns:
AbstractFileSystem
- classmethod coerce(path)[source]¶
Determine which backend to use automatically
Example
>>> path2 = FSPath.coerce('/local/path') >>> print(f'path2={path2}') >>> assert path2.is_local() >>> # xdoctest: +REQUIRES(module:s3fs) >>> path1 = FSPath.coerce('s3://demo_bucket') >>> print(f'path1={path1}') >>> assert path1.is_remote()
- open(mode='rb', block_size=None, cache_options=None, compression=None)[source]¶
Example
>>> dpath = LocalPath.appdir('simple_dvc/fsspec/tests/open').ensuredir() >>> fpath = dpath / 'file.txt' >>> file = fpath.open(mode='w') >>> file.write('hello world') >>> file.close() >>> assert fpath.read_text() == fpath.open('r').read()
- delete(recursive='auto', maxdepth=True)[source]¶
Deletes this file or this directory (and all of its contents)
Unlike fs.delete, this will not error if the file doesnt exist. See
FSPath.rm()if you want standard error-ing behavior.
- rm(recursive='auto', maxdepth=True)[source]¶
Deletes this file or this directory (and all of its contents)
- walk(include_protocol='auto', **kwargs)[source]¶
- Yields:
Tuple[Self, List[str], List[str]] - root, dir names, file names
- property parent¶
- property name¶
- property stem¶
- property suffix¶
- property suffixes¶
- property parts¶
- copy(dst, recursive='auto', maxdepth=None, on_error=None, callback=None, verbose=1, idempotent=True, overwrite=False, **kwargs)[source]¶
Copies this file or directory to dst
Abtracts fsspec copy / put / get.
If dst ends with a “/”, it will be assumed to be a directory, and target files will go within.
Unlike fsspec, this attempts to be idempotent.
- Parameters:
dst (FSPath) – location to copy to
recursive (bool | str) – If ‘auto’ (the default), attempt to determine if this is a directory or a file. Set to True if it is a directory and False otherwise. If you know what this is beforehand, you can set it explicitly to be more efficient.
maxdepth (int | None) – only makes sense when recursive is True
callback (None | callable) – for put / get cases
on_error (str) – either “raise”, “ignore”. Only applicable in the “copy” case.
idempotent (bool) – if False, use standard fsspec behavior, otherwise attempt to be idempotent.
overwrite (bool) – if True, overwrite existing data instead of erroring. Defaults to False.
Note
There are different functions depending on if we are going from remote->remote (copy), local->remote (put), or remote->local (get)
References
https://filesystem-spec.readthedocs.io/en/latest/copying.html
- tree(max_files=100, dirblocklist=None, show_nfiles='auto', return_text=False, return_tree=True, pathstyle='name', max_depth=None, with_type=False, abs_root_label=True, colors=False)[source]¶
Filesystem tree representation
Like the unix util tree, but allow writing numbers of files per directory when given -d option
Ported from xdev.misc.tree_repr
Todo
instead of building the networkx structure and then waiting to display everything, build and display simultaniously. Will require using a modified version of write_network_text
- Parameters:
max_files (int | None) – maximum files to print before supressing a directory
pathstyle (str) – can be rel, name, or abs
return_tree (bool) – if True return the tree
return_text (bool) – if True return the text
maxdepth (int | None) – maximum depth to descend
abs_root_label (bool) – if True force the root to always be absolute
colors (bool) – if True use rich
- class simple_dvc.util_fsspec.LocalPath(path, *, fs=None)[source]¶
Bases:
FSPathThe implementation for the local filesystem
Example
>>> dpath = ub.Path.appdir('simple_dvc/tests/util_fsspec/demo') >>> dpath.delete().ensuredir() >>> (dpath / 'file1.txt').write_text('data') >>> (dpath / 'dpath').ensuredir() >>> (dpath / 'dpath/file2.txt').write_text('data') >>> self = LocalPath(dpath).absolute() >>> print(f'self={self}') >>> print(self.ls()) >>> info = self.tree() >>> fsspec_dpath = (dpath / 'dpath') >>> fsspec_fpath = (dpath / 'file1.txt') >>> pathlib_dpath = ub.Path(dpath / 'pathlib_dpath') >>> pathlib_fpath = ub.Path(dpath / 'pathlib_fpath') >>> assert not pathlib_dpath.exists() >>> assert not pathlib_fpath.exists() >>> fsspec_dpath.copy(pathlib_dpath) >>> fsspec_fpath.copy(pathlib_fpath) >>> assert pathlib_dpath.exists() >>> assert pathlib_fpath.exists()
- class simple_dvc.util_fsspec.RemotePath(path, *, fs=None)[source]¶
Bases:
FSPathAbstract implementation for all remote filesystems
- class simple_dvc.util_fsspec.S3Path(path, *, fs=None)[source]¶
Bases:
RemotePathThe specific S3 remote filesystem.
Control credentials with the environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN.
A single S3 filesystem is used by default, but you can work with multiple of them if you pass in the fs object. E.g.
fs = S3Path._new_fs(profile=’iarpa’) self = S3Path(‘s3://kitware-smart-watch-data/’, fs=fs) self.ls()
To work with different S3 filesystems,
- Requirements:
s3fs>=2023.6.0
References
Example
>>> # xdoctest: +REQUIRES(module:s3fs) >>> fs = S3Path._new_fs()
- class simple_dvc.util_fsspec.SSHPath(path, *, fs=None)[source]¶
Bases:
RemotePath- property host¶