simple_dvc.util_fsspec module

fsspec wrappers that should make working with S3 / the local file system seemless.

Todo

Someone must have already implemented this somewhere. Find that to either use directly or as a reference.

Note

  • While under development needs to be synced between

~/code/simple_dvc/simple_dvc/util_fsspec.py

AND

~/code/watch/watch/utils/util_fsspec.py

Might move to kwutil later

Look into:

https://github.com/fsspec/universal_pathlib https://pypi.org/project/pathlibfs/

class simple_dvc.util_fsspec.FSPath(path, *, fs=None)[source]

Bases: str

Provide a pathlib.Path-like way of interacting with fsspec.

This has a few notable differences with pathlib.Path. We inherit from str because pathlib.Path semantics can break protocols sections of URIs. This means we have to use os.path functions to implement things like FSPath.relative_to() and FSPath.joinpath() (which behave differently than pathlib)

Note

Not all of the fsspec / pathlib operations are currently implemented, add as needed.

Example

>>> cwd = FSPath.coerce('.')
>>> print(cwd)
>>> print(cwd.fs)
classmethod _new_fs(**kwargs)[source]

Create a new filesystem instance based on __protocol__

classmethod _current_fs(**kwargs)[source]

The “default” FileSystem object. Get the most recent filesystem with this protocol, or create a new one with defaults.

Returns:

AbstractFileSystem

classmethod coerce(path)[source]

Determine which backend to use automatically

Example

>>> path2 = FSPath.coerce('/local/path')
>>> print(f'path2={path2}')
>>> assert path2.is_local()
>>> # xdoctest: +REQUIRES(module:s3fs)
>>> path1 = FSPath.coerce('s3://demo_bucket')
>>> print(f'path1={path1}')
>>> assert path1.is_remote()
relative_to(other)[source]
is_remote()[source]
is_local()[source]
open(mode='rb', block_size=None, cache_options=None, compression=None)[source]

Example

>>> dpath = LocalPath.appdir('simple_dvc/fsspec/tests/open').ensuredir()
>>> fpath = dpath / 'file.txt'
>>> file = fpath.open(mode='w')
>>> file.write('hello world')
>>> file.close()
>>> assert fpath.read_text() == fpath.open('r').read()
ls(detail=False, **kwargs)[source]
touch(truncate=False, **kwargs)[source]
move(path1, path2, recursive='auto', maxdepth=None, **kwargs)[source]
delete(recursive='auto', maxdepth=True)[source]

Deletes this file or this directory (and all of its contents)

Unlike fs.delete, this will not error if the file doesnt exist. See FSPath.rm() if you want standard error-ing behavior.

rm(recursive='auto', maxdepth=True)[source]

Deletes this file or this directory (and all of its contents)

mkdir(create_parents=True, **kwargs)[source]

Note

does nothing on some filesystems (e.g. S3)

stat()[source]
is_dir()[source]
is_file()[source]
exists()[source]
write_text(value, **kwargs)[source]
read_text(**kwargs)[source]
walk(include_protocol='auto', **kwargs)[source]
Yields:

Tuple[Self, List[str], List[str]] - root, dir names, file names

property parent
property name
property stem
property suffix
property suffixes
property parts
copy(dst, recursive='auto', maxdepth=None, on_error=None, callback=None, verbose=1, idempotent=True, overwrite=False, **kwargs)[source]

Copies this file or directory to dst

Abtracts fsspec copy / put / get.

If dst ends with a “/”, it will be assumed to be a directory, and target files will go within.

Unlike fsspec, this attempts to be idempotent.

Parameters:
  • dst (FSPath) – location to copy to

  • recursive (bool | str) – If ‘auto’ (the default), attempt to determine if this is a directory or a file. Set to True if it is a directory and False otherwise. If you know what this is beforehand, you can set it explicitly to be more efficient.

  • maxdepth (int | None) – only makes sense when recursive is True

  • callback (None | callable) – for put / get cases

  • on_error (str) – either “raise”, “ignore”. Only applicable in the “copy” case.

  • idempotent (bool) – if False, use standard fsspec behavior, otherwise attempt to be idempotent.

  • overwrite (bool) – if True, overwrite existing data instead of erroring. Defaults to False.

Note

There are different functions depending on if we are going from remote->remote (copy), local->remote (put), or remote->local (get)

References

https://filesystem-spec.readthedocs.io/en/latest/copying.html

joinpath(*others)[source]
tree(max_files=100, dirblocklist=None, show_nfiles='auto', return_text=False, return_tree=True, pathstyle='name', max_depth=None, with_type=False, abs_root_label=True, colors=False)[source]

Filesystem tree representation

Like the unix util tree, but allow writing numbers of files per directory when given -d option

Ported from xdev.misc.tree_repr

Todo

instead of building the networkx structure and then waiting to display everything, build and display simultaniously. Will require using a modified version of write_network_text

Parameters:
  • max_files (int | None) – maximum files to print before supressing a directory

  • pathstyle (str) – can be rel, name, or abs

  • return_tree (bool) – if True return the tree

  • return_text (bool) – if True return the text

  • maxdepth (int | None) – maximum depth to descend

  • abs_root_label (bool) – if True force the root to always be absolute

  • colors (bool) – if True use rich

class simple_dvc.util_fsspec.LocalPath(path, *, fs=None)[source]

Bases: FSPath

The implementation for the local filesystem

Example

>>> dpath = ub.Path.appdir('simple_dvc/tests/util_fsspec/demo')
>>> dpath.delete().ensuredir()
>>> (dpath / 'file1.txt').write_text('data')
>>> (dpath / 'dpath').ensuredir()
>>> (dpath / 'dpath/file2.txt').write_text('data')
>>> self = LocalPath(dpath).absolute()
>>> print(f'self={self}')
>>> print(self.ls())
>>> info = self.tree()
>>> fsspec_dpath = (dpath / 'dpath')
>>> fsspec_fpath = (dpath / 'file1.txt')
>>> pathlib_dpath = ub.Path(dpath / 'pathlib_dpath')
>>> pathlib_fpath = ub.Path(dpath / 'pathlib_fpath')
>>> assert not pathlib_dpath.exists()
>>> assert not pathlib_fpath.exists()
>>> fsspec_dpath.copy(pathlib_dpath)
>>> fsspec_fpath.copy(pathlib_fpath)
>>> assert pathlib_dpath.exists()
>>> assert pathlib_fpath.exists()
ensuredir(mode=511)[source]
absolute()[source]
classmethod appdir(*args, **kw)[source]
class simple_dvc.util_fsspec.RemotePath(path, *, fs=None)[source]

Bases: FSPath

Abstract implementation for all remote filesystems

class simple_dvc.util_fsspec.S3Path(path, *, fs=None)[source]

Bases: RemotePath

The specific S3 remote filesystem.

Control credentials with the environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN.

A single S3 filesystem is used by default, but you can work with multiple of them if you pass in the fs object. E.g.

fs = S3Path._new_fs(profile=’iarpa’) self = S3Path(‘s3://kitware-smart-watch-data/’, fs=fs) self.ls()

To work with different S3 filesystems,

Requirements:

s3fs>=2023.6.0

References

Example

>>> # xdoctest: +REQUIRES(module:s3fs)
>>> fs = S3Path._new_fs()
_as_gdal_vsi()[source]
class simple_dvc.util_fsspec.SSHPath(path, *, fs=None)[source]

Bases: RemotePath

property host
classmethod coerce(path)[source]
class simple_dvc.util_fsspec.MemoryPath(path, *, fs=None)[source]

Bases: FSPath