Dvc: Parameter Diff while using Experimental Parameterization feature

Created on 16 Dec 2020  路  4Comments  路  Source: iterative/dvc

Bug Report

params diff : using parameterization

Description

Reproduce

In master branch I have a params.yaml like so:

datasets:
  - A
  - B
task: classification
seed: 42
feature_type: SparseDataset
training:
  alpha: 0.01

In exp branch I have a params.yaml like so:

datasets:
  - A
  - B
task: classification
seed: 42
feature_type: SparseDataset
training:
  alpha: 10.0

I have a training stage in dvc.yaml like so:

train_model:
    foreach: ${datasets}
    do:
      cmd: >-
        python scripts/train.py
        -- dataset ${item}
    params:
        - training.alpha

When I run dvc params diff in my exp branch I get
ERROR: unexpected error - duplicated key 'training.alpha'

Example:

  1. dvc init
  2. Copy dataset.zip to the directory
  3. dvc add dataset.zip
  4. dvc run -d dataset.zip -o model ./train.sh
  5. modify dataset.zip
  6. dvc repro

Expected

Path Param Old New
params.yaml training.alpha 0.01 10.0

Environment information

Mac OS Catalina 10.15.7
Python 3.7.9

Output of dvc version:

$ dvc version

Tried on different dvc versions (1.11.8, master, 1.10.2) and only works on 1.10.2

Additional Information (if any):
Output of dvc params diff -v:

2020-12-16 13:07:56,623 DEBUG: Check for update is enabled.
2020-12-16 13:07:56,639 DEBUG: fetched: [(3,)]
2020-12-16 13:07:56,703 DEBUG:    37.35 ms in resolving values
2020-12-16 13:07:56,824 DEBUG:    34.89 ms in resolving values
2020-12-16 13:07:56,851 DEBUG: fetched: [(5,)]
2020-12-16 13:07:56,858 DEBUG: fetched: [(3,)]
2020-12-16 13:07:56,911 DEBUG:    35.37 ms in resolving values
2020-12-16 13:07:56,946 DEBUG: fetched: [(5,)]
2020-12-16 13:07:56,949 ERROR: unexpected error - duplicated key 'training.alpha'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/main.py", line 90, in main
    ret = cmd.run()
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/command/params.py", line 38, in run
    all=self.args.all,
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/repo/params/__init__.py", line 13, in diff
    return diff(self.repo, *args, **kwargs)
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/repo/params/diff.py", line 25, in diff
    format_dict(old), format_dict(new), with_unchanged=with_unchanged
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/utils/diff.py", line 82, in diff
    path_diff = _diff(old.get(path), new.get(path), with_unchanged)
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/utils/diff.py", line 67, in _diff
    return _diff_dicts(old, new, with_unchanged)
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/utils/diff.py", line 46, in _diff_dicts
    new = _flatten(new_dict)
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/utils/diff.py", line 40, in _flatten
    return defaultdict(lambda: None, flatten(d))
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/dvc/utils/flatten.py", line 5, in flatten
    return flatten_dict.flatten(d, reducer="dot")
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 91, in flatten
    _flatten(d)
  File "/Users/bharathc/miniconda3/envs/dev/lib/python3.7/site-packages/flatten_dict/flatten_dict.py", line 88, in _flatten
    raise ValueError("duplicated key '{}'".format(flat_key))
ValueError: duplicated key 'training.alpha'
------------------------------------------------------------
2020-12-16 13:07:57,103 DEBUG: Version info for developers:
DVC version: 1.11.8 (pip)
---------------------------------
Platform: Python 3.7.9 on Darwin-19.6.0-x86_64-i386-64bit
Supports: gdrive, http, https, s3
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2020-12-16 13:07:57,104 DEBUG: Analytics is enabled.
2020-12-16 13:07:57,208 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/r0/_1zb4w596xvdc5gydc9xxc9r0000gn/T/tmpkpzmnb5t']'
2020-12-16 13:07:57,210 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/r0/_1zb4w596xvdc5gydc9xxc9r0000gn/T/tmpkpzmnb5t']'
bug p1-important parametrization

Most helpful comment

Running on master successfully by doing pip install git+https://github.com/iterative/dvc, going to close this issue. Thanks for resolving.

All 4 comments

I am not able to reproduce through the steps you mentioned. Looking at the code, it seems the issue is that we are not skipping files that are already a param dependency for the parameterization (they are always shown by default).


From your log, it is clear that we are going through 3 different revisions in the params (and, looks like this happens in metrics too) which should have been just 2. Will try to fix these, and please take a look later if it fixes your issue. Thanks

@bharathc346, could you please try on 1.11 branch of the DVC or the master whichever you prefer, and see if the bug was fixed? I was not able to repro the bug, so, would be good to get a confirmation. Thanks.

Running on master successfully by doing pip install git+https://github.com/iterative/dvc, going to close this issue. Thanks for resolving.

Was this page helpful?
5 / 5 - 1 ratings