List version works:
stages:
build:
foreach:
- 15
- 25
- 35
do:
cmd: echo ${item}
jc@mathai3:~/Downloads/dvc_test$ dvc repro "build@15"
Running callback stage 'build@15' with command:
echo 15
15
However, the dict version doesn't
stages:
build:
foreach:
- level: 15
- level: 25
- level: 35
do:
cmd: echo ${item.level}
jc@mathai3:~/Downloads/dvc_test$ dvc repro --dry
Running callback stage 'build@{'level': 15}' with command:
echo 15
Running callback stage 'build@{'level': 25}' with command:
echo 25
Running callback stage 'build@{'level': 35}' with command:
echo 35
Use `dvc push` to send your updates to remote storage.
jc@mathai3:~/Downloads/dvc_test$ dvc repro -f "build@{'level': 15}"
ERROR: 'build@{'level'' does not exist.
According to the error output, I guess the stage name is incorrectly parsed somewhere. I tried to look at the source code but failed to find a clue.
Output of dvc version:
DVC version: 1.10.1+abb78c
---------------------------------
Platform: Python 3.8.3 on Linux-5.4.0-42-generic-x86_64-with-glibc2.10
Supports: All remotes
Cache types: symlink
Cache directory: nfs4 on storage:/home
Caches: local
Remotes: None
Workspace directory: nfs4 on storage:/home
Repo: dvc, git
@johnnychen94, this is not supported. List in foreach data is not supported to have dict or any complex object inside it.
Can you refactor it to use dict than a list:
stages:
build:
foreach:
level1:
level: 15
level2:
level: 35
do:
cmd: echo ${item.level}
Good to know the alternative and yes it works.
# dvc.yaml
stages:
build:
foreach: ${models}
do:
cmd: "echo type: ${item.noise_type} level: ${item.noise_level}"
# params.yml
models:
model_1:
noise_type: "gaussian"
noise_level: 15
model_2:
noise_type: "gaussian"
noise_level: 25
jc@mathai3:~/Downloads/dvc_test$ dvc repro --dry -f
Running stage 'build@model_1' with command:
echo type: gaussian level: 15
Running stage 'build@model_2' with command:
echo type: gaussian level: 25
Use `dvc push` to send your updates to remote storage.
Should I update the wiki? I guess this is something that people will occasionally hit because it's simpler to write
# params.yml
models:
- noise_type: "gaussian"
noise_level: 15
- noise_type: "gaussian"
noise_level: 25
Hi, @johnnychen94. I created #4961 to fix this issue, but I still recommend using dictionary for better experience.
Should I update the wiki? I guess this is something that people will occasionally hit because it's simpler to write
I don't think you have access to the wiki, let me add a note there.
I'd still like to keep this issue open for future considerations, so please don't consider it as _officially supported_ yet.
# params.yml models: - noise_type: "gaussian" noise_level: 15 - noise_type: "gaussian" noise_level: 25
@johnnychen94 , Coming to this again, would generating stages as build@0, build@1 ... work for you? This will be incompatible change, however.
Or, would that be confusing? And, will you need the key/index of those lists (lists don't have key right now :), only dicts do).
Thanks.
sounds good to me. Is the id order guaranteed, I guess not?
@johnnychen94, good question. I think it's good to assume list as ordered and generate on the basis of that index, what do you think?
We could also do that on the basis of its content, but it'll complicate internals (which might require hashing?) and making it that way, DVC won't be able to work if you change/update any item inside. I think this is more suitable for a dictionary.
Another question, should it be 0-indexed or 1-indexed? 馃槃
I believe our current parametrization workflow does not (and should not) rely on the index order, so perhaps some careful documentation on this would be sufficient, e.g., "the index order is undefined and depends on internal implementation".
I can't foresee what extra benefits hash could give us compared to the simple list order. So might be a good enough reason to stick to the simple version.
Another question, should it be 0-indexed or 1-indexed?
As far as it's well documented, it doesn't matter much for me. I know that there are people who dislike another language because it uses 1-indexed but that makes no sense to me.
I prefer this to be 0-indexed because it might be better for future job scheduling tasks. For example, suppose that we have 20 jobs, then with the glob feature #4976 we could split it into two balanced batches: dvc run build@? --glob and dvc run build@1? --glob. (Does ? work as a single character match here?)
Another example that 0-indexed might help, is that nvidia devices are ordered by 0-indexed (pcie bus order), so it can be convenient to schedule the first job build@0 to the first gpu card CUDA_VISIBLE_DEVICES=0, and so on.
Most helpful comment
Hi, @johnnychen94. I created #4961 to fix this issue, but I still recommend using dictionary for better experience.
I'd still like to keep this issue open for future considerations, so please don't consider it as _officially supported_ yet.