Dvc: Visualize the DAG

Created on 10 Jul 2018  路  10Comments  路  Source: iterative/dvc

First of all: I really like DVC!

Often I use a simple Makefile to define my machine learning/data science pipelines. I also have a little script which uses graphviz to visualize the pipeline. Now I'm experimenting with DVC and testing if it can replace my Makefile. I'm missing the visualization feature.

TLDR: It would be great if DVC could visualize the DAG/pipeline that is created by dvc run ....

question

Most helpful comment

Hey @efiop,

thanks for the quick response and explaining your reasoning.
Yes, installing (py)graphviz can be tricky. Offering the visualization functionality as separate package or plugin sounds reasonable.

That being said, maybe you could just spit out the dot file (which does not require the graphviz dependency) and let the user handle the transformation.

Also maybe a quick workaround: show all commands that had to be run to produce a certain artefact.

$ dvc show data/processed/models/v1.pkl
dvc run -d data/interim/data.pkl -o data/processed/models/v1.pkl python train_v1.py data/interim/data.pkl data/processed/models/v1.pkl
  dvc run -d data/raw/data.csv -o data/interim/data.pkl python prepare.py data/raw/data.csv data/interim/data.pkl

All 10 comments

Hi @sotte !

We previously had a dvc show pipeline command that was producing a jpeg using graphviz, but we later decided to remove it from the core product, because it had external dependencies(pygraphviz) that were not very easy to install on every system. Since then we've received a number of requests to bring it back and decided to go with extracting that feature into a separate package and hooking it up as a dvc plugin(https://github.com/iterative/dvc/issues/852). Another problem we had with graphviz was that it was distracting to be opening images outside of the terminal and we really wanted something that could help visualize DAG in the terminal. We currently have dvc status command, that simply outputs what dependencies/outputs and dvc files had changed in your project, but it is clearly not enough. I've also stumbled upon some perl tool that was able to output graphiz graphs in ascii, thus making it somewhat possible to show it in the terminal, but it is clearly not suitable to be included into the core dvc, unless it is rewritten in python or something... If you have suggestions on how we could output the graph in the terminal in a nice way, please feel free to share with us :) For now the current plan is to implement CLI plugins and move old dvc show code into a separate package, which we plan to tackle in the nearest future.

Thanks,
Ruslan

Hey @efiop,

thanks for the quick response and explaining your reasoning.
Yes, installing (py)graphviz can be tricky. Offering the visualization functionality as separate package or plugin sounds reasonable.

That being said, maybe you could just spit out the dot file (which does not require the graphviz dependency) and let the user handle the transformation.

Also maybe a quick workaround: show all commands that had to be run to produce a certain artefact.

$ dvc show data/processed/models/v1.pkl
dvc run -d data/interim/data.pkl -o data/processed/models/v1.pkl python train_v1.py data/interim/data.pkl data/processed/models/v1.pkl
  dvc run -d data/raw/data.csv -o data/interim/data.pkl python prepare.py data/raw/data.csv data/interim/data.pkl

Hi @sotte !

With previous dvc show we did also spit out a dot file, which, in my opinion, is not complete enough of a feature to include into the core dvc, but we will be sure to include it into the plugin.

Showing all dvc run commands is pretty neat, but as you can see from your own example, it is pretty hard to read, especially when there are a few steps involved. It is much easier to read them in a yaml format that dvc files are written in. And, well, graph image is even more informative :) But I agree, it could be very helpful when you want to match your original dvc run commands with what you've got in your project. For now we will work on bringing dvc show(maybe renamed) back as a plugin and will go from there if there are any other tools required.

Thank you for your great suggestions!

--Ruslan

Hi @sotte !

One more thing. We are currently considering implementing something like a github plugin to create an interactive visualization for the DAG and we were wondering whether you would be interested in something like that. For example, what if dvc show would output a link that will lead to the web page with interactive DAG for your project?

Thanks,
Ruslan

Hi @efiop

ya, the hypothetical dvc show is not super easy to read, but some readability is better than none :) That being said, I'm still exploring the best workflow with dvc and maybe it offer all I need already. Compared to the previously mentioned "makefile workflow" I'm just missing this one place that tells me how an artifact was created (and ideally visualize how it was created).

The plugin you mention sounds interesting! If dvc showwould open a visualization of the DAG on a web page...great. How would that be related to github plugins though?

Thank you for the feedback!

How would that be related to github plugins though?

It would be a structural part of it. I.e. plugin would automatically run dvc show for your repository when it is being pushed to and display it on the site(plus some badge for your project's README similar to codecov/codeclimate badges), same way you get a link to it if you manually run dvc show in your repository.

@sotte We have thought a bit more about the problem of somehow visualizing DAG in the terminal and came up with the new dvc pipeline show command. Here is a tiny doc for it with an example down below https://dvc.org/doc/commands-reference/pipeline . It has been released in 0.12.0, so please feel free to upgrade and give it a try.

I just tested it and I like it (still a graph would be amazing :)). I'll use it some more and I'll give more feedback later. Thanks!

Hi @sotte !

Just wanted to let you know that we now have ASCII DAG visualization. Here is a simple exampe https://dvc.org/doc/get-started/visualize . Please feel free to try it out :slightly_smiling_face:

Thanks,
Ruslan

@efiop yaey, thanks! I'll give it a try and get back to you.

Was this page helpful?
0 / 5 - 0 ratings