simple_dvc.api module

A simplified Python DVC API

class simple_dvc.api.SimpleDVC(dvc_root=None, remote=None)[source]

Bases: NiceRepr

A Simple DVC API

Parameters:
  • dvc_root (str | PathLike) – path to DVC repo directory

  • remote (str) – dvc remote to sync to by default

CommandLine

xdoctest -m simple_dvc.api SimpleDVC

Example

>>> from simple_dvc import SimpleDVC
>>> self = SimpleDVC.demo()
>>> a_file_fpath = self.dpath / 'a_file.txt'
>>> if not a_file_fpath.exists():
>>>     a_file_fpath.write_text('hello')
>>> self.add(a_file_fpath)
list_remotes(name=None)[source]
property dpath
classmethod demo(dpath=None, reset=False, with_git=False)[source]

Create a demo DVC repo for tests.

Parameters:

dpath (str | PathLike) – If specified force the repo to be made here. Otherwise choose a default.

Example

>>> from simple_dvc.api import *  # NOQA
>>> cls = SimpleDVC
>>> self1 = cls.demo(with_git=0)
>>> self2 = cls.demo(with_git=1)
classmethod demo_dpath(reset=False)[source]
classmethod init(dpath, no_scm=False, force=False, verbose=0)[source]

Initialize a DVC repo in a path

property cache_dir
classmethod coerce(dvc_path, **kw)[source]

Given a path inside DVC, finds the root.

classmethod find_root(path=None)[source]

Given a path, search its ancestors to find the root of a dvc repo.

Returns:

Path | None

_ensure_root(paths)[source]
_ensure_remote(remote)[source]
_resolve_root_and_relative_paths(paths)[source]
add(path, verbose=0)[source]
Parameters:

path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add

pathsremove(path, verbose=0)[source]
Parameters:

path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add

_dvc_path_op(op, path, verbose=0)[source]
Parameters:

path (str | PathLike | Iterable[str | PathLike]) – a single or multiple paths to add

check_ignore(path, details=0, verbose=0)[source]
git_pull()[source]
git_push()[source]
git_commit(message)[source]
git_commitpush(message='', pull_on_fail=True)[source]

TODO: better name here?

_verbose_extra_args(verbose)[source]
_remote_extra_args(remote, recursive, jobs, verbose)[source]
push(path, remote=None, recursive=False, jobs=None, verbose=0)[source]

Push the content tracked by .dvc files to remote storage.

Parameters:
  • path (Path | List[Path]) – one or more file paths that should have an associated .dvc sidecar file or if recursive is true, a directory containing multiple tracked files.

  • remote (str) – the name of the remote registered in the .dvc/config to push to

  • recursive (bool) – if True, then items in path can be a directory.

  • jobs (int) – number of parallel workers

pull(path, remote=None, recursive=False, jobs=None, verbose=0, allow_missing=False, force=False)[source]

Wrapper around DVC pull

CommandLine

xdoctest -m simple_dvc.api SimpleDVC.pull

Example

>>> from simple_dvc.api import SimpleDVC  # NOQA
>>> import ubelt as ub
>>> remote = SimpleDVC.demo(with_git=1)
>>> dpath = ub.Path.appdir('simple_dvc/doctest/pull').delete().ensuredir()
>>> dvc_dpath = dpath / 'repo'
>>> # Clone a Demo DVC repo, setup the remote, and test
>>> _ = ub.cmd(f'git clone {remote.dpath / ".git"} {dvc_dpath}', verbose=3)
>>> _ = ub.cmd(f'dvc remote add local --default {remote.dpath / ".dvc/cache"}', verbose=3, cwd=dvc_dpath)
>>> # Setup our local API
>>> self = SimpleDVC(dvc_dpath)
>>> # Test a basic file pull
>>> assert not (dvc_dpath / 'root_file').exists()
>>> # Note: pull does accept the non-dvc file request
>>> self.pull(dvc_dpath / 'root_file.dvc')
>>> assert (dvc_dpath / 'root_file').exists()
>>> # Test a basic directory pull
>>> assert not (dvc_dpath / 'root_dir').exists()
>>> self.pull(dvc_dpath / 'root_dir.dvc')
>>> assert (dvc_dpath / 'root_dir').exists()
>>> # Test a recursive pull
>>> assert not (dvc_dpath / 'test-set1/manifest.txt').exists()
>>> assert not (dvc_dpath / 'test-set1/assets/').exists()
>>> self.pull(dvc_dpath / 'test-set1', recursive=True)
>>> assert (dvc_dpath / 'test-set1/manifest.txt').exists()
>>> assert (dvc_dpath / 'test-set1/assets/').exists()
>>> assert len((dvc_dpath / 'test-set1/assets/').ls()) > 1
request(path, remote=None, verbose=0, pull=False)[source]

Requests to ensure that a specific file from DVC exists.

Any files that do not exist, check to see if there is an associated .dvc sidecar file. If any sidecar files are missing, an error is thrown. Otherwise we attempt to pull the missing files.

Todo

  • Add argument to validate that the data was pulled correctly

(i.e. there are no dangling symlinks)

Parameters:
  • path (Path | List[Path]) – one or more file paths that should have an associated .dvc sidecar file.

  • remote – specify the DVC remote

  • verbose (int) – verbosity

  • pull (bool) – if True pull instead of request (convinience option)

CommandLine

xdoctest -m simple_dvc.api SimpleDVC.request

Example

>>> from simple_dvc.api import SimpleDVC  # NOQA
>>> import ubelt as ub
>>> remote = SimpleDVC.demo(with_git=1)
>>> dpath = ub.Path.appdir('simple_dvc/doctest/request').delete().ensuredir()
>>> dvc_dpath = dpath / 'repo'
>>> # Clone a Demo DVC repo, setup the remote, and test
>>> _ = ub.cmd(f'git clone {remote.dpath / ".git"} {dvc_dpath}', verbose=3)
>>> _ = ub.cmd(f'dvc remote add local --default {remote.dpath / ".dvc/cache"}', verbose=3, cwd=dvc_dpath)
>>> # Setup our local API
>>> self = SimpleDVC(dvc_dpath)
>>> # Test a recursive pull
>>> assert not (dvc_dpath / 'test-set1/manifest.txt').exists()
>>> assert not (dvc_dpath / 'test-set1/assets/').exists()
>>> path = dvc_dpath / 'test-set1/assets/asset_004.data'
>>> self.request(path, verbose=0)
>>> assert (dvc_dpath / 'test-set1/assets/').exists()
>>> assert len((dvc_dpath / 'test-set1/assets/').ls()) > 1
unprotect(path, verbose=0)[source]
is_tracked(path)[source]
classmethod find_file_tracker(path)[source]
find_dir_tracker(path)[source]
read_dvc_sidecar(sidecar_fpath)[source]
resolve_cache_paths(sidecar_fpath)[source]

Given a .dvc file, enumerate the paths in the cache associated with it.

Parameters:

sidecar_fpath (PathLike | str) – path to the .dvc file

Example

>>> from simple_dvc.api import SimpleDVC  # NOQA
>>> self = SimpleDVC.demo()
>>> # on a file
>>> sidecar_fpath = self.dpath / 'test-set1/manifest.txt.dvc'
>>> resolved_cache_paths = list(self.resolve_cache_paths(sidecar_fpath))
>>> print('resolved_cache_paths = {}'.format(ub.urepr(resolved_cache_paths, nl=1)))
>>> # on a simple directory
>>> sidecar_fpath = self.dpath / 'test-set1/assets.dvc'
>>> resolved_cache_paths = list(self.resolve_cache_paths(sidecar_fpath))
>>> print('resolved_cache_paths = {}'.format(ub.urepr(resolved_cache_paths, nl=1)))
>>> # on a complex directory
>>> sidecar_fpath = self.dpath / 'root_dir.dvc'
>>> resolved_cache_paths = list(self.resolve_cache_paths(sidecar_fpath))
>>> print('resolved_cache_paths = {}'.format(ub.urepr(resolved_cache_paths, nl=1)))
_sidecar_references(sidecar_fpath)[source]
Parameters:

fpath (str | PathLike) – path to a sidecar file.

Yields:

Dict – Information about each sidecar file as they are read.

find_sidecar_paths_in_dpath(dpath)[source]

Find DVC sidecar files in a directory.

Parameters:

dpath (Path | str) – directory in dvc repo to search

Yields:

ub.Path – existing dvc sidecar files

find_sidecar_paths_associated_with(dpath)[source]

DEPRECATE:

Use sidecar_paths

Parameters:

dpath (Path | str) – directory in dvc repo to search

Yields:

ub.Path – existing dvc sidecar files

sidecars(path)[source]

Generates all sidecar objects associated with a path.

sidecar_paths(path)[source]

Given a path in a DVC repo, resolve it to a sidecar file that it corresponds to.

Cases:
  • Input is a .dvc file - return it

  • Input is a file with an associated .dvc - return the assocaited .dvc file

  • Input is inside a folder tracked by dvc - return the .dvc file of the folder

  • Input is a folder with multiple .dvc paths in it - return all .dvc files in the folder.

If the input is a .dvc file return it.

If it is inside a directory that corresponds to a dvc repo, search for that.

Parameters:

path (Path | str) – directory or file in dvc repo to search

Yields:

ub.Path – existing dvc sidecar files

Example

>>> from simple_dvc.api import SimpleDVC  # NOQA
>>> self = SimpleDVC.demo()
>>> #
>>> # Calling on an untracked directory returns all sidecars in the
>>> # directory
>>> all_sidecars = list(self.sidecar_paths(self.dpath))
>>> assert len(all_sidecars) > 1
>>> #
>>> # Calling on a tracked file in a directory returns the sidecar for
>>> # that directory
>>> tracked_fpath = (self.dpath / 'test-set1/assets').ls()[0]
>>> dir_sidecar = list(self.sidecar_paths(tracked_fpath))
>>> assert len(dir_sidecar) == 1
>>> #
>>> # Calling on a dvc file returns the dvc file
>>> asset_dvc_fpath = (self.dpath / 'test-set1/assets.dvc')
>>> found_sidecars = list(self.sidecar_paths(asset_dvc_fpath))
>>> assert found_sidecars == [asset_dvc_fpath]
>>> #
>>> # Calling on a tracked file returns the dvc file
>>> manifest_dvc_fpath = (self.dpath / 'test-set1/manifest.txt.dvc')
>>> manifest_fpath = (self.dpath / 'test-set1/manifest.txt')
>>> found_sidecars = list(self.sidecar_paths(manifest_dvc_fpath))
>>> assert found_sidecars == [manifest_dvc_fpath]
>>> found_sidecars = list(self.sidecar_paths(manifest_fpath))
>>> assert found_sidecars == [manifest_dvc_fpath]
simple_dvc.api._ensure_iterable(inputs)[source]
simple_dvc.api._import_dvc_main()[source]
simple_dvc.api.SDVC

alias of SimpleDVC