Dvc: "dvc run" flag --add-stdout

Created on 22 Aug 2019  路  3Comments  路  Source: iterative/dvc

(see Discord context)

It would sometimes be quite handy to have a flag like --add-stdout foo to dvc run, e.g.

dvc run -d input.csv --add-stdout output.csv mycommand.sh

which would capture stdout to a file, and add that file as an output of the pipeline step. It would basically be equivalent to

dvc run -d input.csv --outs output.csv 'mycommand.sh > output.csv'

but the implementation would presumably be to just dvc's subprocess.run(cmd) call to subprocess.run(cmd, stdout=file) - you wouldn't literally change the cmd string to use shell redirection.

Name is negotiable of course.

Benefits include:

  • Avoid using shell-isms in simple cases
  • Avoid repeating the filename
feature request good first issue p3-nice-to-have

Most helpful comment

While I agree that I find myself repeating the input/output filename in almost every dvc stage already,

e.g:

md5: 33abd8c0a78648177c165c9a5b549ea7
cmd: ../scripts/sample.py raw/train.parquet train_sample.parquet
wdir: .
deps:
- md5: 65dd48acd71d88dec45949e0f8a17817
  path: raw/train.parquet
outs:
- md5: cd0ec3b2b25d5bdbdebe872dbfcf6576
  path: train_sample.parquet
  cache: true
  metric: false
  persist: false

My vote is we either tackle this problem more broadly (not _just_ for the case of simple redirection) or leave it as is?

All 3 comments

Duplicating shell features will open a whole new universe for us to implement and a whole new UI for people to get used to, I am strongly against adding this. We should encourage people using either shell redirection or adjusting their commands instead.

While I agree that I find myself repeating the input/output filename in almost every dvc stage already,

e.g:

md5: 33abd8c0a78648177c165c9a5b549ea7
cmd: ../scripts/sample.py raw/train.parquet train_sample.parquet
wdir: .
deps:
- md5: 65dd48acd71d88dec45949e0f8a17817
  path: raw/train.parquet
outs:
- md5: cd0ec3b2b25d5bdbdebe872dbfcf6576
  path: train_sample.parquet
  cache: true
  metric: false
  persist: false

My vote is we either tackle this problem more broadly (not _just_ for the case of simple redirection) or leave it as is?

If name duplication is the root of this then we should count this issue as +1 to providing some dedup methods.

Was this page helpful?
0 / 5 - 0 ratings