Dvc.org: dvc run step in tutorial does not give expected results

Created on 28 Jun 2020  ·  17Comments  ·  Source: iterative/dvc.org

Bug Report

$ dvc version
DVC version: 1.0.1
Python version: 3.7.1
Platform: Linux-4.4.0-176-generic-x86_64-with-debian-stretch-sid
Binary: True
Package: deb
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('ext4', '/dev/mapper/qwerty--vg-root')
Repo: dvc, git
Filesystem type (workspace): ('ext4', '/dev/mapper/qwerty--vg-root')

Following https://dvc.org/doc/use-cases/versioning-data-and-model-files/tutorial, dvc run step returns an error instead of creating the Dvcfile/dvc.yaml file:

dvc run -f Dvcfile \
          -d train.py -d data \
          -M metrics.csv \
          -o model.h5 -o bottleneck_features_train.npy -o bottleneck_features_validation.npy \
          python train.py

ERROR: `-n|--name` is required

iterative/dvc#4077 states that -f option is deprecated. Running with the -n option results in a subsequent error:

dvc run \
    -d train.py \
    -d data \
    -n training \
    -M metrics.csv \
    -o model.h5 \
    -o bottleneck_features_train.npy \
    -o bottleneck_features_validation.npy \
    python train.py --verbose

ERROR: output 'model.h5' is already specified in stage: 'model.h5.dvc'.

Please update the tutorial to reflect changes in v1.0.

Thanks!

bug doc-content good first issue help wanted priority-p1

All 17 comments

@shachibista sorry about that. It's a little bit outdated (DVC version <0.94). We changed the dvc run interface in the DVC 1.0 and haven't had time to update some tutorials.

@shcheklein Would it require a lot of (conceptual) changes to update the tutorial? If it is only a single command, I could send a pull request if you could tell the correct parameters.

@shachibista no huge conceptual changes! Just changing commands, altering text here and there (e.g. no Dvcfile anymore, but dvc.yaml).

To start I think we can try to remove -f something and put -n train - that alone should probably fix the issue.

@shachibista we would greatly appreciate if you're able to update the commands! I would encourage you to give it a shot: it may only be a matter of a few small changes here and there 🙂

@jorgeorpinel this one should be fixed now, I suppose?

Yes. Thanks for the reminder. About to push a small PR so I can include it there... ⏳

Oh, actually this was already done in https://github.com/iterative/dvc.org/pull/1526/files but we haven't double checked the instructions. @shachibista if you try again please let us know your results.

@sarthakforwet can you confirm whether you ran the whole tutorial after the update? Or if you are able to do so and share the results next? Thanks

Oh, actually this was already done in https://github.com/iterative/dvc.org/pull/1526/files

Yes @jorgeorpinel new changes are reflected in #1526 and I also ran the tutorial and its working fine.

ERROR: output 'model.h5' is already specified in stage: 'model.h5.dvc'

@shachibista Can you tell us if you followed the complete tutorial sequentially and checked out to master before the moving on to the section - Automating Capturing? Thanks.

use_cases_versioning_tutorial_output_part_4.txt

Above file contains the complete output that we get from running the dvc run command under Automating Capturing section of the tutorial.

Platform Used - Google Colab
Accelerator - None

Running of tutorials and code examples on Google Colab would be good choice. Is there a mention of this in the current documentation @jorgeorpinel ?

OK, this should be resolved then. Thanks!

ERROR: output 'model.h5' is already specified in stage: 'model.h5.dvc'.

Maybe Shachi ran dvc add model.h5 on their own. But anyway, the tutorial is now up to date for 1.0

@sarthakforwet Yes, I followed the tutorial sequentially as written.

@jorgeorpinel Yes, I did (as instructed in the tutorial). But I reviewed it and noticed: We manually added the model output here, which isn't ideal. The preferred way of capturing command outputs is with dvc run. More on this later.

Removing the model.h5.dvc file and re-running the updated tutorial runs successfully! :confetti_ball:

Thank you all!

Perhaps it would be clearer if the documentation re-stated this before the Automating capturing section.

Perhaps it would be clearer if the documentation re-stated this before the Automating capturing section.

@shachibista thanks for the suggestion! Instead of interpreting exactly what you mean, would you like to submit a PR with your suggested change? You can do it easily from Github here: https://github.com/iterative/dvc.org/blob/master/content/docs/use-cases/versioning-data-and-model-files/tutorial.md

@jorgeorpinel - I can re-iterate the preceding point.

The tutorial _expressly_ tells you to run dvc add model.h5 in _two_ different places, prior to the dvc run command mentioned in the original post. This leads to the error reported.

I don't know the 'correct' way to deal with that. One way to to tear everything down and repeat the tutorial without running that command, which seems very counter-intuitive.

I presume there's a way to proceed through the tutorial without having to tear everything down, I just don't know how.

Either way, it would be helpful and appreciated if the tutorial could be updated? (Is this the correct place for this message?)

run dvc add model.h5 in two different places... This leads to the error reported

@MatBailie running dvc add again on a tracked file won't throw any errors: If the file is different now, it prints 100% Add ...; If it hasn't changed, it prints Stage is cached, skipping. Unless you're using an old version of DVC, in which case maybe the command behavior is different.

In any case, for the tutorial, the 2nd add is after python train.py so the model file should be different. Or maybe there's an error and the training doesn't update the model file? I haven't tried the full tutorial recently but will check...

ERROR: output 'model.h5' is already specified in stage: 'model.h5.dvc'.

UPDATE: I just followed the full tutorial in https://dvc.org/doc/use-cases/versioning-data-and-model-files/tutorial and I see the error comes out at dvc run in the last section, because dvc add was previously used on the same file.

OK, I got y'all now... Will update it ⌛

Being fixed in #1658. Thanks!

Was this page helpful?
0 / 5 - 0 ratings