simple_dvc package¶
Submodules¶
- simple_dvc.__main__ module
- simple_dvc.api module
SimpleDVCSimpleDVC.list_remotes()SimpleDVC.dpathSimpleDVC.demo()SimpleDVC.demo_dpath()SimpleDVC.init()SimpleDVC.cache_dirSimpleDVC.coerce()SimpleDVC.find_root()SimpleDVC._ensure_root()SimpleDVC._ensure_remote()SimpleDVC._resolve_root_and_relative_paths()SimpleDVC.add()SimpleDVC.pathsremove()SimpleDVC._dvc_path_op()SimpleDVC.check_ignore()SimpleDVC.git_pull()SimpleDVC.git_push()SimpleDVC.git_commit()SimpleDVC.git_commitpush()SimpleDVC._verbose_extra_args()SimpleDVC._remote_extra_args()SimpleDVC.push()SimpleDVC.pull()SimpleDVC.request()SimpleDVC.unprotect()SimpleDVC.is_tracked()SimpleDVC.find_file_tracker()SimpleDVC.find_dir_tracker()SimpleDVC.read_dvc_sidecar()SimpleDVC.resolve_cache_paths()SimpleDVC._sidecar_references()SimpleDVC.find_sidecar_paths_in_dpath()SimpleDVC.find_sidecar_paths_associated_with()SimpleDVC.sidecars()SimpleDVC.sidecar_paths()
_ensure_iterable()_import_dvc_main()SDVC
- simple_dvc.cache_surgery module
- simple_dvc.cache_validate module
- simple_dvc.demo module
- simple_dvc.discover_ssh_remote module
- simple_dvc.main module
- simple_dvc.registery module
- simple_dvc.sidecar module
- simple_dvc.util_fsspec module
FSPathFSPath._new_fs()FSPath._current_fs()FSPath.coerce()FSPath.relative_to()FSPath.is_remote()FSPath.is_local()FSPath.open()FSPath.ls()FSPath.touch()FSPath.move()FSPath.delete()FSPath.rm()FSPath.mkdir()FSPath.stat()FSPath.is_dir()FSPath.is_file()FSPath.is_link()FSPath.exists()FSPath.write_text()FSPath.read_text()FSPath.walk()FSPath.parentFSPath.nameFSPath.stemFSPath.suffixFSPath.suffixesFSPath.partsFSPath.copy()FSPath.joinpath()FSPath.tree()
LocalPathRemotePathS3PathSSHPathMemoryPath
Module contents¶
Simple DVC¶
Read the Docs |
|
Gitlab (main) |
|
Pypi |
- class simple_dvc.SimpleDVC(dvc_root=None, remote=None)[source]¶
Bases:
NiceReprA Simple DVC API
- Parameters:
dvc_root (str | PathLike) – path to DVC repo directory
remote (str) – dvc remote to sync to by default
CommandLine
xdoctest -m simple_dvc.api SimpleDVC
Example
>>> from simple_dvc import SimpleDVC >>> self = SimpleDVC.demo() >>> a_file_fpath = self.dpath / 'a_file.txt' >>> if not a_file_fpath.exists(): >>> a_file_fpath.write_text('hello') >>> self.add(a_file_fpath)
- property dpath¶
- classmethod demo(dpath=None, reset=False, with_git=False)[source]¶
Create a demo DVC repo for tests.
- Parameters:
dpath (str | PathLike) – If specified force the repo to be made here. Otherwise choose a default.
Example
>>> from simple_dvc.api import * # NOQA >>> cls = SimpleDVC >>> self1 = cls.demo(with_git=0) >>> self2 = cls.demo(with_git=1)
- classmethod init(dpath, no_scm=False, force=False, verbose=0)[source]¶
Initialize a DVC repo in a path
- property cache_dir¶
- classmethod find_root(path=None)[source]¶
Given a path, search its ancestors to find the root of a dvc repo.
- Returns:
Path | None
- add(path, verbose=0)[source]¶
- Parameters:
path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add
- pathsremove(path, verbose=0)[source]¶
- Parameters:
path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add
- _dvc_path_op(op, path, verbose=0)[source]¶
- Parameters:
path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add
- push(path, remote=None, recursive=False, jobs=None, verbose=0)[source]¶
Push the content tracked by .dvc files to remote storage.
- Parameters:
path (Path | List[Path]) – one or more file paths that should have an associated .dvc sidecar file or if recursive is true, a directory containing multiple tracked files.
remote (str) – the name of the remote registered in the .dvc/config to push to
recursive (bool) – if True, then items in
pathcan be a directory.jobs (int) – number of parallel workers
- pull(path, remote=None, recursive=False, jobs=None, verbose=0, allow_missing=False, force=False)[source]¶
Wrapper around DVC pull
CommandLine
xdoctest -m simple_dvc.api SimpleDVC.pull
Example
>>> from simple_dvc.api import SimpleDVC # NOQA >>> import ubelt as ub >>> remote = SimpleDVC.demo(with_git=1) >>> dpath = ub.Path.appdir('simple_dvc/doctest/pull').delete().ensuredir() >>> dvc_dpath = dpath / 'repo' >>> # Clone a Demo DVC repo, setup the remote, and test >>> _ = ub.cmd(f'git clone {remote.dpath / ".git"} {dvc_dpath}', verbose=3) >>> _ = ub.cmd(f'dvc remote add local --default {remote.dpath / ".dvc/cache"}', verbose=3, cwd=dvc_dpath) >>> # Setup our local API >>> self = SimpleDVC(dvc_dpath) >>> # Test a basic file pull >>> assert not (dvc_dpath / 'root_file').exists() >>> # Note: pull does accept the non-dvc file request >>> self.pull(dvc_dpath / 'root_file.dvc') >>> assert (dvc_dpath / 'root_file').exists() >>> # Test a basic directory pull >>> assert not (dvc_dpath / 'root_dir').exists() >>> self.pull(dvc_dpath / 'root_dir.dvc') >>> assert (dvc_dpath / 'root_dir').exists() >>> # Test a recursive pull >>> assert not (dvc_dpath / 'test-set1/manifest.txt').exists() >>> assert not (dvc_dpath / 'test-set1/assets/').exists() >>> self.pull(dvc_dpath / 'test-set1', recursive=True) >>> assert (dvc_dpath / 'test-set1/manifest.txt').exists() >>> assert (dvc_dpath / 'test-set1/assets/').exists() >>> assert len((dvc_dpath / 'test-set1/assets/').ls()) > 1
- request(path, remote=None, verbose=0, pull=False)[source]¶
Requests to ensure that a specific file from DVC exists.
Any files that do not exist, check to see if there is an associated .dvc sidecar file. If any sidecar files are missing, an error is thrown. Otherwise we attempt to pull the missing files.
Todo
Add argument to validate that the data was pulled correctly
(i.e. there are no dangling symlinks)
- Parameters:
path (Path | List[Path]) – one or more file paths that should have an associated .dvc sidecar file.
remote – specify the DVC remote
verbose (int) – verbosity
pull (bool) – if True pull instead of request (convinience option)
CommandLine
xdoctest -m simple_dvc.api SimpleDVC.request
Example
>>> from simple_dvc.api import SimpleDVC # NOQA >>> import ubelt as ub >>> remote = SimpleDVC.demo(with_git=1) >>> dpath = ub.Path.appdir('simple_dvc/doctest/request').delete().ensuredir() >>> dvc_dpath = dpath / 'repo' >>> # Clone a Demo DVC repo, setup the remote, and test >>> _ = ub.cmd(f'git clone {remote.dpath / ".git"} {dvc_dpath}', verbose=3) >>> _ = ub.cmd(f'dvc remote add local --default {remote.dpath / ".dvc/cache"}', verbose=3, cwd=dvc_dpath) >>> # Setup our local API >>> self = SimpleDVC(dvc_dpath) >>> # Test a recursive pull >>> assert not (dvc_dpath / 'test-set1/manifest.txt').exists() >>> assert not (dvc_dpath / 'test-set1/assets/').exists() >>> path = dvc_dpath / 'test-set1/assets/asset_004.data' >>> self.request(path, verbose=0) >>> assert (dvc_dpath / 'test-set1/assets/').exists() >>> assert len((dvc_dpath / 'test-set1/assets/').ls()) > 1
- resolve_cache_paths(sidecar_fpath)[source]¶
Given a .dvc file, enumerate the paths in the cache associated with it.
- Parameters:
sidecar_fpath (PathLike | str) – path to the .dvc file
Example
>>> from simple_dvc.api import SimpleDVC # NOQA >>> self = SimpleDVC.demo() >>> # on a file >>> sidecar_fpath = self.dpath / 'test-set1/manifest.txt.dvc' >>> resolved_cache_paths = list(self.resolve_cache_paths(sidecar_fpath)) >>> print('resolved_cache_paths = {}'.format(ub.urepr(resolved_cache_paths, nl=1))) >>> # on a simple directory >>> sidecar_fpath = self.dpath / 'test-set1/assets.dvc' >>> resolved_cache_paths = list(self.resolve_cache_paths(sidecar_fpath)) >>> print('resolved_cache_paths = {}'.format(ub.urepr(resolved_cache_paths, nl=1))) >>> # on a complex directory >>> sidecar_fpath = self.dpath / 'root_dir.dvc' >>> resolved_cache_paths = list(self.resolve_cache_paths(sidecar_fpath)) >>> print('resolved_cache_paths = {}'.format(ub.urepr(resolved_cache_paths, nl=1)))
- _sidecar_references(sidecar_fpath)[source]¶
- Parameters:
fpath (str | PathLike) – path to a sidecar file.
- Yields:
Dict – Information about each sidecar file as they are read.
- find_sidecar_paths_in_dpath(dpath)[source]¶
Find DVC sidecar files in a directory.
- Parameters:
dpath (Path | str) – directory in dvc repo to search
- Yields:
ub.Path – existing dvc sidecar files
- find_sidecar_paths_associated_with(dpath)[source]¶
DEPRECATE:
Use sidecar_paths
- Parameters:
dpath (Path | str) – directory in dvc repo to search
- Yields:
ub.Path – existing dvc sidecar files
- sidecar_paths(path)[source]¶
Given a path in a DVC repo, resolve it to a sidecar file that it corresponds to.
- Cases:
Input is a .dvc file - return it
Input is a file with an associated .dvc - return the assocaited .dvc file
Input is inside a folder tracked by dvc - return the .dvc file of the folder
Input is a folder with multiple .dvc paths in it - return all .dvc files in the folder.
If the input is a .dvc file return it.
If it is inside a directory that corresponds to a dvc repo, search for that.
- Parameters:
path (Path | str) – directory or file in dvc repo to search
- Yields:
ub.Path – existing dvc sidecar files
Example
>>> from simple_dvc.api import SimpleDVC # NOQA >>> self = SimpleDVC.demo() >>> # >>> # Calling on an untracked directory returns all sidecars in the >>> # directory >>> all_sidecars = list(self.sidecar_paths(self.dpath)) >>> assert len(all_sidecars) > 1 >>> # >>> # Calling on a tracked file in a directory returns the sidecar for >>> # that directory >>> tracked_fpath = (self.dpath / 'test-set1/assets').ls()[0] >>> dir_sidecar = list(self.sidecar_paths(tracked_fpath)) >>> assert len(dir_sidecar) == 1 >>> # >>> # Calling on a dvc file returns the dvc file >>> asset_dvc_fpath = (self.dpath / 'test-set1/assets.dvc') >>> found_sidecars = list(self.sidecar_paths(asset_dvc_fpath)) >>> assert found_sidecars == [asset_dvc_fpath] >>> # >>> # Calling on a tracked file returns the dvc file >>> manifest_dvc_fpath = (self.dpath / 'test-set1/manifest.txt.dvc') >>> manifest_fpath = (self.dpath / 'test-set1/manifest.txt') >>> found_sidecars = list(self.sidecar_paths(manifest_dvc_fpath)) >>> assert found_sidecars == [manifest_dvc_fpath] >>> found_sidecars = list(self.sidecar_paths(manifest_fpath)) >>> assert found_sidecars == [manifest_dvc_fpath]