Dvc.org: user-guide: add "Best Practices"

Created on 14 Aug 2018  Â·  22Comments  Â·  Source: iterative/dvc.org

UPDATE: Possibly as a How To guide (see #899)

Looks like we need a special section describing ways how to organize your projects:

  • [ ] how to use DVC with DB (see https://github.com/iterative/dvc.org/issues/594)
  • our default Dvcfile trick
  • [ ] manually editing dvc.yaml + dvc commit or dvc repro (see also https://github.com/iterative/dvc.org/issues/230#issuecomment-511769103)
    it's safe to edit DVC files, no need to touch or update md5, DVC will take care of it
  • specify meaningful stage names with -f
  • [ ] creating a pipeline in a 'debug' directory and then moving it to different data sets
  • [ ] creating a pipeline in a 'debug' directory and then modifying respective DVC files to set different data sets as an input
  • [ ] add "use meta to preserve your content" - #306
  • [ ] never store user credentials in the DVC project config

See also the latest relevant https://github.com/iterative/dvc.org/issues/72#issuecomment-682868683 and below.

doc-content enhancement epic priority-p2

Most helpful comment

It's definitely not How to. It is of the same level as Managing data, etc. Ot my mind section like Managing Experiments should be within Get started, Use Cases, and User Guide at the top level.

All 22 comments

Also worth mentioning our default Dvcfile trick.

it's safe to edit dvc files, no need to touch or update md5, dvc will take care of it

Specify meaningful stage names

Also, we are not sure about branches anymore.

Never store user credentials in the DVC project config.

@shcheklein we should keep branches - it is a good practice. However, we should mention that for some cases like hyperparameters tuning branches are not very relevant.

@dmpetrov agreed, you are right. I just probably wanted to highlight that we should not be pushing branches as a single best option in all case - there are tags, directories, may be even mention other tools for now?

@shcheklein also we need to implement experiment diroutput feature for hyperparameters tuning use case (Stefan's use case).

add "use meta to preserve your content" - https://github.com/iterative/dvc.org/issues/306

HIi @shcheklein. I would like to work on this issue.

@Soumya0803 sure! feel free to write a document for this. Please join our chat dvc.org/chat, we have separate #dev-docs channel if you have any questions.

Is not "Best Practices" the same as "Use Cases"?
Maybe we should rename "Use Cases" ==> "Best Practices"

@dashohoxha no, it's not the same. "Best Practices" are relatively small tricks and advices you should be using to be efficient with DVC. They are usually general and do not depend on your specific use case.

Should this be merged with #230 and featured in the #899 epic? We're trying to avoid so many sections now.

Also, the Questions part of What is DVC? (currently in https://dvc.org/doc/user-guide/what-is-dvc/collaboration-issues#questions) probably overlaps with this.

@jorgeorpinel Yeah, that indeed seems suitable.

@jorgeorpinel how would it looks like? like a subsection in How To?

Just a single document under How To.

I updated the description of this issue and in fact I think #230 is already included here, in the "manually editing dvc.yaml + dvc commit or dvc repro" checkbox.

UPDATES:

Just a single document under How To.

We are currently following this approach in #1705 but I'm not sure it will stick. Maybe Best Practices should be in the form of _Explanation_ (a regular user guide, or directly under Home, even) and not as a _How-to_ (problem-solution format). We'll see...

  • [ ] And another best practice to write about is "how to" track and version compressed archives, composite binaries, even video perhaps (see this support case) - overlaps with #682 though.

Another possible best practice (or anti-practice):

  • [ ] Avoid dynamic names (and other non-deterministic behavior — mentioned in dvc run ref). See this support case for context.

@efiop do you think how to: add a page for Managing Experiments #816 would be better as a best practice too? Instead of a how-to as it's requested now. Thanks

It's definitely not How to. It is of the same level as Managing data, etc. Ot my mind section like Managing Experiments should be within Get started, Use Cases, and User Guide at the top level.

Another possible one:

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jorgeorpinel picture jorgeorpinel  Â·  3Comments

jorgeorpinel picture jorgeorpinel  Â·  4Comments

utkarshsingh99 picture utkarshsingh99  Â·  3Comments

algomaster99 picture algomaster99  Â·  4Comments

jorgeorpinel picture jorgeorpinel  Â·  4Comments