See src @ https://github.com/iterative/dvc/blob/master/dvc/api.py
dvc.api.get_url
dvc.api.open
dvc.api.read
Please more insights here More details in https://github.com/iterative/dvc.org/issues/463#issuecomment-514139792.
Also, please update the one mention to API in the data registry (which will me merged with #818) per https://github.com/iterative/dvc.org/pull/818#issuecomment-565083934.
UPDATE:
dvc.api.summon and open another issue to complete it later.@Suor can you describe it briefly, please :)
@jorgeorpinel @shcheklein can you please explain this a bit more. I would like to take this up.
@Suor ? can you give a summary or is there a link? we should probably put docstrings around APIs before we release it.
Sorry, I am unable to search dvc.api.get_content . Do you mean get_url or __getattr__ ?
dvc.api.open and other methods already have docstrings around them in iterative/dvc/dvc/api,py
@Naba7 it's read now, I believe. There are only three public methods in dvc.api. Those should be described. And let's create a ticket on iterative/dvc to update docstrings for those APIs.
They have some short docstrings, I will update them based on future docs or discussion here if we decide to do that.
So, what is this about? There are three public things in dvc.api now:
read(path, repo=None, rev=None, remote=None, mode="r", encoding=None) - returns the contents of an artifact as a bytes object or a string.get_url(path, repo=None, rev=None, remote=None) - returns an url of an artifact.open(path, repo=None, rev=None, remote=None, mode="r", encoding=None) - opens an artifact as a file, may only be used as context manager:python
with dvc.api.open("path/to/data.csv", remote="my-s3", encoding="utf-8") as f:
for line in f:
process(line)
Arguments always mean the same:
path - a path to an artifact, relative to repo root,
repo - a path or git url of a repo,
rev - revision, i.e. a branch, a tag, a sha. This only works with an url in repo,
remote - a name of a remote to fetch artifact from/give url to
mode - a mode with which we open a file, the only sensible options are r/rt and rb
encoding - an encoding used to decode contents to a string
mode and encoding mirror their namesakes builtin open() has.
k, thanks @Suor. @Naba7 now we need to come with a good place and a format for it. Probably, we need a separate top-level section. API reference similar to command reference we have.
Shall we describe read, open, and get_url in different sections inside API Reference?
We can have another section for describing path, repo, rev, remote, mode, encoding.
@Naba7 yes, it should be a separate section per call. At least for now.
path, repo, rev, remote, mode, encoding - are not APIs. repo is, but it's not officially released yet.
Can you take a look for some good projects that have APIs documented to come up with a good page template for this?
@Suor are repo=None, rev=None, remote=None default values actually None? Or do they get turned into repo='.', rev='HEAD', remote=(read from config file)? Probably important to document (both in docstring and) in the API ref.
@shcheklein re
Probably, we need a separate top-level section.
API referencesimilar to command reference we have.
Agree, perhaps in docs path /api-reference and the index page for that section could explain what is the API and how to start using it. Actually its not that obvious! I'm not 100% sure what we mean by "the DVC API" for example. Is it a Python library people can install separately?
$ pip install dvc
...
$ python
...
>>> from dvc import api as dvcapi
>>> dvcapi
<module 'dvc.api' from '/.../dvc/dvc/api.py'>
>>> # etc
that's right. I would say that it's not separate though, it's the same DVC package.
Sorry, I didn't understand it. I thought DVC-API is same as DVC package being installed by binary-files or by any other means and it is a separate package to run on top of git and we can implement it locally or on remote machines.
@Naba7 ?? I'm not sure I understand this. DVC api is just a python module defined in the DVC itself. No need to install any extras to being able to use it.
Suggested intro. text for the new /api-reference section index page:
The DVC API is part of the dvc Python module installed along with DVC. It allows to use some of the core functions of DVC in your Python scripts and applications. You may include it in your Python files with:
from dvc import api as dvcapi
Then list read, open, and get_url in a bullet list, linked to their own pages (which should also have left pane navigation items).
@shcheklein Okay. I get it now. I thought for getting DVC-api we need to download it separately.
@jorgeorpinel repo is also an api, we should include it in the intro.
And
1) ... some of the core functions of DVC such as add, push, pull, commit, checkout, etc., ...
2) Writing one-liner for read,open, get_url, repo(?) such as:
3) Other followed up pages may contain vivid description and followed up by examples. What more can we include here?
I think that's good enough to start a PR. Please let us know, thanks!
@jorgeorpinel
I would say simply use import dvc.api instead of from dvc import api as dvcapi, more straightforward and almost the same length:
import csv
import pickle
import dvc.api
# Loading from content
model = pickle.loads(dvc.api.read("some-model.pkl", repo="https://github.com/..."))
# Loading using file descriptor
with dvc.api.open("dataset.csv", repo=...) as fd:
reader = csv.reader(fd)
for row in reader:
# ...
# Obtaining an url
resource_url = dvc.api.get_url("path/to/resource.ext", repo=..., remote="s3")
@Naba7 I would start with some Usage section, with short and most common examples, then continue with complete API listing.
Or another layout: Install, Usage, Methods sections. Then each method goes on its separate page linked from Method section, with full operation and params description, more examples. The point is making it glanceable and copy-pastable, while providing all the ins and outs too.
I think the layout : Install, Usage, Methods and describing each methods is better.
@shcheklein @jorgeorpinel If you agree to this, I will start working on the same.
@Suor I quite don't understand by "copy-pastable, while providing all he ins and outs too."
Since, you don't need to install any other package, so we can mention that in one line and link to install DVC.
So, starting with Usage section for now.
@Naba7 yep, I like the idea. So, we can start with three levels:
Python API is the top most
it includes Install, Usage, Method Reference
Method Reference includes one page per each method with simple example. And we need to discuss the structure for it.
@jorgeorpinel any thoughts on this?
Agree. Looking forward to see a first version PR 馃檪
I am sorry. I won't be able to work further on this PR.
@Naba7 np! thank you for all your contributions ;)
They have some short docstrings, I will update them based on future docs or discussion here if we decide to do that.
@Suor please see iterative/dvc/issues/3092
are repo=None, rev=None, remote=None default values actually None? Or do they get turned into repo='.', rev='HEAD', remote=(read from config file)?
Can you confirm about this Q ^ please? Thanks
Most helpful comment
@Naba7 np! thank you for all your contributions ;)