dvc pipeline show --ascii - visual bug since version 0.82.9

Created on 26 Feb 2020  路  3Comments  路  Source: iterative/dvc

DVC version: 0.82.9 and following
reproduced on several platforms, like:
4.15.0-55-generic #60-Ubuntu
macOS High Sierra 10.13.6
macOS Catalina 10.15.2

Hi,

the command "dvc pipeline show --ascii" seems to have a bug since version 0.89.9, which does not display the dependencies properly, when there is a branch in a Pipeline.

I created a simple pipeline, which looks like this:

Pipeline with Version 0.82.8
Pipeline with Version 0.82.8

If i use the command "dvc pipeline show --ascii validation.dvc" with version 0.82.8, the whole pipeline gets displayed properly.
If i upgrade my Version to e.g. 0.83.0, the connection between "val.dvc" and the previous Stage is missing. If i use the command on val.dvc itself, the stages "train.dvc" and "validation.dvc" are missing (see screenshots).

Pipeline with Version 0.83.0
Pipeline with Version 0.83.0

Pipeline with Version 0.83.0, when using val.dvc in the command
Pipeline with Version 0.83.0, when using val.dvc in the command

bug p0-critical

All 3 comments

Can confirm that issue still exists.
Reproduction script:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo
git init --quiet
dvc init -q

echo data>>data
dvc add data

dvc run -d data -f dvc_run.dvc -o processed_data "cat data>>processed_data"
dvc run -d processed_data -o val "cat processed_data>>val"
dvc run -d processed_data -o train "cat processed_data>>train"
dvc run -d val -d train -o validation "echo validated >> validation"
dvc pipeline show --ascii validation.dvc

Ok, so:

  1. @efiop is right that the cause lies in #3217
  2. Before this change, when building graph for outs we iterated over all graph edges and have been adding them for display.
  3. After the change, we used dfs_edges algorithm which, as stated in source:
    Perform a depth-first-search over the nodes of G and yield the edges in order.
    So what does it mean, is that dfs_edges iterates over nodes using dfs and returns edges visited during traversal. That means that if some edges were not visited during traversal, it was not returned.
  4. In order to obtain all edges, we need to use edge_dfs which focuses on visiting edges, and not nodes.
Was this page helpful?
0 / 5 - 0 ratings

Related issues

shcheklein picture shcheklein  路  3Comments

gregfriedland picture gregfriedland  路  3Comments

TezRomacH picture TezRomacH  路  3Comments

shcheklein picture shcheklein  路  3Comments

nik123 picture nik123  路  3Comments