Dvc.org: add "Jupyter notebook" article

Created on 22 Oct 2018 · 5Comments · Source: iterative/dvc.org

doc-content enhancement help wanted

Source

efiop

Most helpful comment

@mlisovyi I’ve been using Jupyter in a pipeline with a command like:

jupyter nbconvert Train.ipynb --clear-output --inplace --execute --ExecutePreprocessor.timeout=-1

This executes the notebook and overwrites it in-place, as if I had opened it in Jupyter and ran the entire notebook manually and saved it. I then commit the resulting notebook to git. I also specify some outputs which are cached: a directory for model checkpoints and a directory for logs. For dependencies, I specify the notebook itself as well as a directory of supporting modules.

I believe the initial command to set it up looked something like

dvc run -d Train.ipynb -d src/ -o checkpoints/ -o logs/ jupyter nbconvert Train.ipynb --clear-output --inplace --execute --ExecutePreprocessor.timeout=-1

You might also need to specify a name for the pipeline step somewhere in that command — I used train.dvc, which I can then execute using dvc repro train.dvc.

colllin on 9 Mar 2019

👍7

All 5 comments

@jurasan

efiop on 22 Oct 2018

If you have any quick tips here, I would appreciate them. I typically use notebooks for development and inline visualization, and I'm trying to migrate a project to dvc right now — my first dvc project 🎉. I'm thinking it might be best to develop and debug in the notebook as usual, then when I'm ready to run the notebook end-to-end, use e.g. dvc run -d train.ipynb -o training.html -o checkpoint.pt jupyter nbconvert --to html --execute train.ipynb.

colllin on 9 Jan 2019

👍1

Hi @colllin !

I see by the name of the notebook train.ipynb, that you are splitting your pipeline into separate steps, that you then plan to run using dvc. That is precisely what we usually recommend! You should be all set :tada: Please don't hesitate to share your experience, we would really appreciate it. :slightly_smiling_face:

efiop on 9 Jan 2019

👍3

Any progress on such example?

mlisovyi on 9 Mar 2019

@mlisovyi I’ve been using Jupyter in a pipeline with a command like:

jupyter nbconvert Train.ipynb --clear-output --inplace --execute --ExecutePreprocessor.timeout=-1

I believe the initial command to set it up looked something like

dvc run -d Train.ipynb -d src/ -o checkpoints/ -o logs/ jupyter nbconvert Train.ipynb --clear-output --inplace --execute --ExecutePreprocessor.timeout=-1

You might also need to specify a name for the pipeline step somewhere in that command — I used train.dvc, which I can then execute using dvc repro train.dvc.

colllin on 9 Mar 2019

👍7

Was this page helpful?

0 / 5 - 0 ratings

Related issues

term: stop using "DVC-file" and "stage file"

jorgeorpinel · 3Comments

term: review "download" in the context of get and import commands, et al.

jorgeorpinel · 4Comments

docs: should not need index files in every parent sidebar element

jorgeorpinel · 3Comments

describe "meta" field in stage file and that we are preserving comments

efiop · 4Comments

sidebar dropdown toggle

utkarshsingh99 · 3Comments