Hello !
When running dvc gc -a -v
, I have this error:
2020-05-27 12:41:57,820 ERROR: 'dvc.yaml' format error: extra keys not allowed @ data['stages']['eval_classifier_fr']['metrics_no_cache']
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 113, in validate
cls.SCHEMA(d)
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/voluptuous/schema_builder.py", line 272, in __call__
return self._compiled([], data)
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/voluptuous/schema_builder.py", line 594, in validate_dict
return base_validate(path, iteritems(data), out)
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/voluptuous/schema_builder.py", line 432, in validate_mapping
raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: extra keys not allowed @ data['stages']['eval_classifier_fr']['metrics_no_cache']
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/main.py", line 53, in main
ret = cmd.run()
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/command/gc.py", line 57, in run
workspace=self.args.workspace,
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 25, in wrapper
ret = f(repo, *args, **kwargs)
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/gc.py", line 73, in gc
jobs=jobs,
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 297, in used_cache
for stage, filter_info in pairs:
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 293, in <genexpr>
for target in targets
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 235, in collect_granular
return [(stage, None) for stage in self.stages]
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/funcy/objects.py", line 28, in __get__
res = instance.__dict__[self.fget.__name__] = self.fget(instance)
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 437, in stages
return self._collect_stages()
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/repo/__init__.py", line 454, in _collect_stages
stage_loader = Dvcfile(self, path).stages
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 222, in stages
data, _ = self._load()
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 106, in _load
self.validate(d, self.relpath)
File "/Users/***/.local/share/virtualenvs/speech-XlnoNrSS/lib/python3.6/site-packages/dvc/dvcfile.py", line 115, in validate
raise StageFileFormatError(f"'{fname}' format error: {exc}")
dvc.stage.exceptions.StageFileFormatError: 'dvc.yaml' format error: extra keys not allowed @ data['stages']['eval_classifier_fr']['metrics_no_cache']
------------------------------------------------------------
I figured out that on some older branches that I have on my local computer, my dvc_yaml
has the key metrics_no_cache
which is wrong.
I would expect the dvc gc -a
to be resilient to formatting errors in the dvc.yaml
format in older branches, or at least having a clearer error message (I need to run my debugger to find on which branch the dvc.yaml
is wrong).
Here is my config:
DVC version: 1.0.0a5
Python version: 3.6.8
Platform: Darwin-19.3.0-x86_64-i386-64bit
Binary: False
Package: pip
Supported remotes: http, https, s3
Cache: reflink - supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('apfs', '/dev/disk1s2')
Repo: dvc, git
Filesystem type (workspace): ('apfs', '/dev/disk1s2')
Hi @courentin. Thanks for the bug report. dvc.yaml
file was introduced in v1.0
(specifically v1.0.0a0
) and existed on v0.94
as a hidden feature.
We did break dvc.yaml
format between alpha releases as expected. This should not happen when v1.0
-stable is released.
I would expect the dvc gc -a to be resilient to formatting errors in the dvc.yaml format in older branches
I agree that, gc
and friends that collect stages should be resilient but not if file format is broken as
they cannot collect anything in such condition and should make such noise.
And, I think the error message was to the point: 'dvc.yaml' format error
.
Again, on alpha releases, we could not guarantee the backward compatibility for file formats, as it's expected to be unstable and/or change.
EDIT: Oops, missed that this happened on a branches. Definitely, we can improve on this.
@courentin You could be able to convert it by replacing
metrics_no_cache:
- something
with
metrics:
- something:
cache: False
If it still doesn't fix it, we might consider supporting metrics_no_cache
again, since it corresponds to dvc run
options, even though it is not our current default way of writing dvc.yaml
.
But regardless, we definitely need to look into handling this error more gracefully.
You could be able to convert it by replacing metrics_no_cache:
@efiop, that won't work as dvc.yaml
will be the same old format on the history.
Yes, I deleted unused branches and converted metrics_no_cache
to metrics
on used ones (the metrics_no_cache
was not necessary).
I agree that, gc and friends that collect stages should be resilient but not if file format is broken as they cannot collect anything in such condition and should make such noise.
@skshetry I understand, my point was more to say that I struggled to understand the error message as it was something old and I needed to run my debugger on dvc gc
to figure out which branch was wrong.
Would it makes sense to add an information about the branch in the error ?
@courentin, Sorry that I misread it at first. The error message definitely needs to improve with whatever magic dvc
is doing with the history. As more and more dvc
commands work with history, it's important that we try to provide context around such.
Most helpful comment
@courentin You could be able to convert it by replacing
with
If it still doesn't fix it, we might consider supporting
metrics_no_cache
again, since it corresponds todvc run
options, even though it is not our current default way of writingdvc.yaml
.But regardless, we definitely need to look into handling this error more gracefully.