git
itself doesn't have a configuration file to store tagging and related meta, and since dvc.yaml
is used to track the whole project, it makes sense to put these meta data into dvc.yaml
An example of this is:
meta:
name: DVC_Test_Project
version: "0.1.0"
author:
- "Johnny Chen <[email protected]>"
- "Jane Doe <[email protected]>"
description: >-
this project is a playground of dvc to see
how it can be used in real world data science.
stages:
build:
cmd: echo "a test project"
This idea comes from Julia where all packages have a Project.toml
file. For example, Pkg.jl/Project.toml
It seems that the following patch works:
https://github.com/iterative/dvc/blob/a7a007e4f97ff08bc27af6be7af66262d17e1ae2/dvc/schema.py#L89-L92
MULTI_STAGE_SCHEMA = {
+ StageParams.PARAM_META: object,
STAGES: SINGLE_PIPELINE_STAGE_SCHEMA,
VARS_KWD: VARS_SCHEMA,
}
Thanks for the suggestion @johnnychen94! We do support meta
at the stage level, and adding support for it at the pipeline level is something we can consider.
One thing to note though is that dvc.yaml
is not really used to track an entire project. It would be more accurate to say that it tracks a single pipeline within your project, and you can have multiple dvc.yaml
pipeline files in a DVC repo (like in this example: https://dvc.org/doc/command-reference/run#example-separate-stages-in-a-subdirectory)
To get something equivalent to Julia's Project.toml
, it may make more sense to just define your own top level metadata file for your projects and track them with git.
DVC does not mandate any particular project structure and can be used for a wide range of use cases (in the same way that git can be used to version almost anything). So it doesn't make as much sense for DVC to define a root "project" level schema, other than the types of configuration info that goes into .dvc
(again, similar to how git configuration works).
So it doesn't make as much sense for DVC to define a root "project" level schema, other than the types of configuration info that goes into .dvc (again, similar to how git configuration works).
To be clear, it is also the same case for Julia's Project.toml
file. And also like how git
's submodule functionality does.
As an example, the following is a project I'm working on, as you can see, there're multiple Project.toml
files in each subfolders, dvc.yaml
defines a stage for each subfolder/subproject.
.
โโโ DnCNN.zip.dvc
โโโ dvc.lock
โโโ dvc.yaml
โโโ evaluate
โย ย โโโ main.jl
โย ย โโโ Manifest.toml
โย ย โโโ Project.toml
โโโ LICENSE
โโโ params.yaml
โโโ prepare
โย ย โโโ generate_data.jl
โย ย โโโ main.jl
โย ย โโโ Manifest.toml
โย ย โโโ Project.toml
โโโ README.md
โโโ train
โโโ compat.jl
โโโ config.json
โโโ main.jl
โโโ Manifest.toml
โโโ model.jl
โโโ Project.toml
โโโ train_network.jl
and for each Project.toml
, it only defines what packages are use, just like Python's requirements.txt
, and those meta info are optional:
[deps]
ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"
Augmentor = "02898b10-1f73-11ea-317c-6393d7073e15"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
ImageCore = "a09fc81d-aa75-5fe9-8630-4744c3626534"
ImageMagick = "6218d12a-5da1-5696-b52f-db25d2ecc6d1"
JLD2 = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
[compat]
Augmentor = "0.6"
FileIO = "1"
ImageMagick = "1"
JLD2 = "0.3"
ProgressMeter = "1"
julia = "1"
So yeah, it doesn't make much sense to define a root "root" project since we always interpret this term in a relative way.
For this specific example, it's the dvc.yaml
file that unifies the entire project, and naturally I want to add some project info into dvc.yaml
. This is exactly why I propose to add some root meta info here.
Said that, It's totally fine to have a whatever file to record this information, a meta.yaml
, for example. It just could be nice if DVC has some recommendations on this.
Is p3-nice-to-have
an accepted proposal but requires community efforts to work on? If so I could take a try this weekend by adding some tests based on https://github.com/iterative/dvc/issues/4960#issuecomment-733171910. And also some docs to the dvc.org repo.
@johnnychen94 If you'd like to work on this issue feel free to take it!
p3 issues are generally ones that the core team may get to eventually if there's enough user interest or need. But we have limited bandwidth and there's just higher priority features & issues that we need to address in the meantime.
Most helpful comment
To be clear, it is also the same case for Julia's
Project.toml
file. And also like howgit
's submodule functionality does.As an example, the following is a project I'm working on, as you can see, there're multiple
Project.toml
files in each subfolders,dvc.yaml
defines a stage for each subfolder/subproject.and for each
Project.toml
, it only defines what packages are use, just like Python'srequirements.txt
, and those meta info are optional:So yeah, it doesn't make much sense to define a root "root" project since we always interpret this term in a relative way.
For this specific example, it's the
dvc.yaml
file that unifies the entire project, and naturally I want to add some project info intodvc.yaml
. This is exactly why I propose to add some root meta info here.Said that, It's totally fine to have a whatever file to record this information, a
meta.yaml
, for example. It just could be nice if DVC has some recommendations on this.