Dvc: Is there an UI ?

Created on 30 Aug 2018  Â·  26Comments  Â·  Source: iterative/dvc

Hi, I am discovering dvc, so excuse me if I missed an obvious answer.

dvc looks great to use with the cli.
Now, I am considering to build a web ui in order to list dvc entities (and trigger experiments runs on cloud resources). Is there a convenient way (like an API) to access to the entities ?

Thanks

question

Most helpful comment

@AKuederle sounds a lot like https://dagshub.com - you can click on the file in the pipeline view and download it from it's source

All 26 comments

Hi @seeb0h !

We do have a python API(dvc/project.py), but it is not stable, not documented and thus not yet ready to be relied on. We will definitely make it stable and release to the public in the future. The only API that is stable and for which we guarantee backward compatibility is CLI. If you only need to access .dvc files, you can do that easily by using any yaml parser of your choice.

The web UI is a very interesting idea, could you please elaborate on it? We have been thinking about creating something like that and would be interested to hear more about what you would like to see it look like.

Thanks,
Ruslan

By "web UI" @seeb0h probably means something like ModelDB's UI:
http://modeldb.csail.mit.edu:3000/projects

screen shot 2018-09-04 at 06 42 54

To expand on this question, It would be really cool if there would be a way to get specfic datasets stored with dvc without initializing the respective git repo. Often it is required to just share a specific version of your data with your client. I am thinking, it might be an interesting addition, if we could write a little webserver/webservice for self hosting, that given a specific data version just provides the dataset as a http download.

I would be interested on working on something like this (probably as a separate project)

I am asking myself in a very forward-looking way.

We are currently updating an ML R&D workflow and dvc is a good candidate for our versioning. We aim (in a more or less distant future) to automate part of the workflow through a web application. So, the presence of an API at dvc will become an important point.

There is many functions that could be done in a webUI.
Tracking metrics as @elgalu mentions. But also provide a convenient way to explore versionning and traceability of code, model and data.

Thank you all guys for the feedback!

@elgalu @seeb0h Thanks for the info! We definitely had similar thoughts on features that we wanted to see implemented in a WebUI on top of core dvc. We are also actively discussing it within our team and think that we will begin working on it by the end of the year. Btw, there is a mailing list in which we sometimes(no spam though) ask the community to answer some questions that help us improve dvc and we will definitely ask more detailed questions/polls there once we start working on the webUI, so please feel free to subscribe :slightly_smiling_face:

@AKuederle

To expand on this question, It would be really cool if there would be a way to get specfic datasets stored with dvc without initializing the respective git repo. Often it is required to just share a specific version of your data with your client. I am thinking, it might be an interesting addition, if we could write a little webserver/webservice for self hosting, that given a specific data version just provides the dataset as a http download.

I would be interested on working on something like this (probably as a separate project)

Sorry, I'm not sure I follow. What you are describing sounds like any file sharing service(e.g. Dropbox, s3, etc) to me. Could you elaborate please?

Thanks,
Ruslan

Thank you for your response. What I basically meant, was that it would be nice to have a way to get the data, which corresponds from a specific project without installing git/dvc. Similar to downloading code from Github as .zip. Of course you could go right to the specific storage location in your AWS bucket, ssh-server, etc. but it would be nice to have a little webservice (with or without gui) that abstracts that in a way and simple terms is able to provide a simple file download given a specific commit hash of the corresponding git repository.
There would be some issues to be solve, e.g. potential credentials to access the data store, but I think it would be a really nice feature, that would allow dvc to be also used as a kind of "frontend" for a ML data archive.

I think this might still be a little confusing. I will try to expand on that and explain the big picture as soon as i have time.

@AKuederle Thank you for sharing your thoughts! This is indeed a very interesting idea! It would be a great addition to the WebUI mentioned above. I think it is definitely going to be implemented in the future.

Thanks,
Ruslan

You could use hug to simultaneously maintain a web and python API and a CLI. It seems to reduce the work of maintaining the three and keeping them in alignment, plus of course the bonus of being able to run dvc as a service. Just my 2c.

Hi @piccolbo !

Thank you for the tip! Hug looks very promising! We will be sure to take a closer look at it when developing WebUI.

Thanks,
Ruslan

Hi @efiop , since you closed this issue, does it mean Hug is the solution we should look at?

@elgalu I don't know, we haven't been looking at it. I just felt that the discussion is done, since webui is not yet on our radar these days. Looks like I was wrong :slightly_smiling_face: Reopening.

@AKuederle sounds a lot like https://dagshub.com - you can click on the file in the pipeline view and download it from it's source

Interesting! I will have a closer look.

Thank you

On Mon, May 13, 2019, 21:39 Amir notifications@github.com wrote:

@AKuederle https://github.com/AKuederle sounds a lot like
https://dagshub.com - you can click on the file in the pipeline view and
download it from it's source

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/iterative/dvc/issues/1074?email_source=notifications&email_token=ACSHBO4ZCI5QZ3R5GJFE353PVG7YPA5CNFSM4FSQK7VKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVJK44Y#issuecomment-491957875,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACSHBOZMOAHYP5SKQEO7JBTPVG7YPANCNFSM4FSQK7VA
.

Hi there! I just wanted to offer my feedback / needs that might possibly be feature requests.

I am working at a start-up and definitely see a need for a tool such as this. Especially the evaluation and comparison is interesting. But similar to the OP I I envision having a dashboard that tracks our team's progress through a web app.

I guess we could make something like this our selves if there is a Python client that we can use to grab the latest results. From what I gather from previous comments is that the Python client needs to become stable first, and I am sure that seems solvable!

But when that is said and done, I wonder if there are any plans for providing tooling for hosting and displaying the results in a web app or would you guys integrate with existing open-source tools to do this? Or would that be left to the user altogether?

@gerardsimons, In my team, we use MLflow tracking. A great web UI, with nice experiment comparison features. Saving ones scores and artefacts can simply be done in a DVC stage.

@gerardsimons thanks! could you please summarize the requirements for the dashboard you have in mind? DVC provides the dvc metrics show command and it can cover some part of it I believe. I'm just wondering what else would be valuable to show via UI on top of DVC. Or should we include more stuff into DVC first to visualize it then. Your input can significantly help us prioritize this.

@PeterFogh Thanks for the suggestion, that looks very interesting indeed! Any resources you can share to set DVC up like that?

@shcheklein : Thank you for the help. I will try my best to elucidate a bit. If I understand correctly I can create my own metric and output those with dvc metrics show in the terminal? That is already very useful. In my specific use case I want to compare object detections, so I would measure IoU at different confidence thresholds and have things like precision and recall visualised on the fly. Often AP is computed at set intervals and we could do a look-up in the metrics what would match the given settings (some kind of slider). It does mean that the UI would need a way to filter the metrics given certain settings I guess but I am not sure. Basic recall precision graphs are also useful when we have to decide based on our business / clients where on the graph we want to be.

Of course nothing beats a couple of good examples to show in my example a few bounding boxes and the image. I think this would be more difficult to do in the current system and is very specialised to the type of data (images in this case).

@gerardsimons thank you for the detailed explanation of your vision on the dashboard. We just started working on visualization in DVC that could be a part of the future dashboard.

I’d love to hear your opinions on DVC visualization:

  1. We need a tool to visualize dvc metrics. What tool or library would you prefer?
  2. We need to compare metrics (scalars like AUC as well as plots like ROC curve) from one commit/branch from another. What is the best way to generalize this?
  3. It would be great if you could provide some examples with the intervals.

Isn't there a Python API yet?
Can't see the dvc/project.py mentioned above

@NyanSet public API in dvc/api.py. Also, you can potentially use Repo class from dvc with some risks. See #3278.

Unfortunately, no metrics or visualization API yet. This is what I'd like to discuss here 😃

Similarly to @PeterFogh I want to use a combination of MLflow and dvc in my future ML projects - the two just seem predestined to be used together. So, logging experiment metrics and parameters would take place primarily in MLflow (although of course logging the metrics in dvc could be done, too). Tracking data and experiment artifacts then happens through dvc.

The option to visually edit the pipeline - as @Amir-Abushanab mentioned is possible in DAGsHub's online tool - would be super cool. I just don't want to host my ML projects on DAGsHub so something like the MLflow UI would be nice. Actually, I was thinking, maybe it's possible to integrate this functionality into the MLflow UI. When selecting an experiment, have a section where you can edit the pipeline. Then, from within MLflow you could launch a new experiment run - the run stage of dvc could expect a config file which MLflow writes the parameters to that the user has to specify before the run (the config file stored in dvc being the default config).

I hope my comment is not offensive/off-topic because I'm suggesting to incorporate this functionality into another tool. I just had this idea and wanted to share my thoughts to see what others think about it.

@florianblume May I ask why you’d prefer not to host the project on DAGsHub? Is it because it’s not hosted anywhere (need a desktop client?) or because it’s hosted on GitHub and editing the pipeline is not an option in mirrors? Maybe we can solve the issue.

Disclaimer: I’m one of the founders of DAGsHub

@deanp70 actually I thought that you had to push data to your service (my misunderstanding, I'm new to dvc) and that a free user wouldn't be able to create a private repository (sounded like it on the plan's page). I'll give it a try - it definitely looks like a very powerful tool. I still think it would be nice to integrate dvc and MLflow just because they seem to complement each other so well. There's also an issue on the MLflow page related to this.

@florianblume I see, thanks for the clarification, it's really helpful. Please let me know if you have any feedback. When we built the experiment tracking into the platform we tried to imagine an ideal combination between DVC and MLflow, but you can always use all of them (DVC, MLflow and DAGsHub)

@florianblume just out of curiosity... have you considered tensorboard for graph instead of mlflow?

@dmpetrov that's what I've been doing until now. I ran each experiment in a different folder, together with a copy of its config file (parameters, path to data sets, etc.), its events and logs and output by the network (of training and prediction). I then started tensorboard in a certain top-level folder to be able to compare experiments - this is referred to as the caveman way in this medium.com article. There are multiple drawbacks with this approach, the most annoying was that it was difficult to compare experiments properly. I had to create an Excel sheet where I entered the results of the runs (exporting by script into a csv was not an option because I needed a certain order in the table). Tensorboard has probably more visualization features compared to MLflow but I feel like MLflow would give me a much better overview over experiments and much better comparison features. That's why I'd go for a combination of DVC, MLflow and TensorBoard (which I have left out for simplicity in my earlier comments).

Was this page helpful?
0 / 5 - 0 ratings

Related issues

kskyten picture kskyten  Â·  44Comments

gvyshnya picture gvyshnya  Â·  36Comments

luchoPipe87 picture luchoPipe87  Â·  69Comments

Casyfill picture Casyfill  Â·  56Comments

dmpetrov picture dmpetrov  Â·  35Comments