Hi guys, can we have machine readable output format option for dvc status, for example:
dvc status --json
{
"stages": [
{
"name": "path/to/a.txt.dvc",
"deps_changed": ["path/to/b.txt.dvc"],
"outs_changed": [],
"callback": False,
"checksum_changed": True
},
{
"name": "path/to/b.txt.dvc",
"locked": True
}
]
}
I don't think it will be needed for other commands, at least for now - we can show dvc console output for user actions such as dvc repro.
Great point! Our internal API function dvc/project.py:Project.status() actually returns a dict, which then gets printed in dvc/command/status.py, so it should be pretty easy to implement.
Just need to add --json option in the dvc/cli.py for dvc status and then process it in dvc/command/status.py and json.dump() it, instead of printing as usual.
Yeah I took the Project.status() in mind, I'm starting to dive in the code 馃
I would additionally include the callback, locked and checksum_changed boolean values explicitly so that the status can be explained. This would also be useful in the human readable output IMO.
And another thing, I'm thinking it would be useful to have separate outs_changed and outs_missing since the consequences are a bit different and should probably be reflected in the status icon - I would go for red with changed outputs and dark grey with missing outputs. I'm guessing users can pull someone else's repository and work with most DVC files without the outputs and I don't want the icons to scream in red. But when there's a mismatch in the output file's checksum, we should take that as a warning so red color makes sense.
Also, showing a status icon means that I have to turn these stage status properties into a single distinct status icon. Since it would be too much to have an icon for all the combinations, the way I see it is to process the properties by severity to produce something like this:
If locked -> locked (yellow lock icon overlay?)
Else if any outputs changed -> outs_changed (red)
Else if any outputs missing -> outs_missing (grey)
Else if md5 changed -> checksum_changed (blue)
Else if any dependencies (--with-deps) changed or missing -> deps_changed (orange)
Else -> ok (original DVC colored icon)
We could also independently show --always-reproduce using some overlay, e.g. a yellow dot in bottom left.
Maybe that logic should actually be done internally and shown in an additional field like "status": "deps_changed". There could even be a --simple option that would show just this field in human readable / machine readable format.
@prihoda Sounds amazing! But how would we handle a bunch of changed things? E.g. deps and outs changed. Black icon?
Personally I wouldn't go for those combinations, I would just process it from most severe to least severe and show the first one using the if-else approach. When the file is open, we can show a detailed description. If you consider all combinations of outputs (changed, missing, ok), dependencies (changed, ok) and checksum (changed, ok), it's 3x2x2 = 12 combinations.
But I'm definitely open for discussion on this. We could enumerate those combinations that might make sense to be shown explicitly and decide on each one based on the different use-cases. In my view, for a file with invalid outputs, I already know that it should be reproduced or pulled so I don't care that dependencies were changed as well.
We could each property with a different marker - like circles in bottom left and right corners, but that might start to become quite busy. I will try to experiment in Photoshop, it will be much easier to decide visually.
Moving the discussion here: https://github.com/iterative/intellij-dvc-support-poc/issues/1
Closing this in favor of https://github.com/iterative/intellij-dvc/issues/1 .
Oops. sorry. This is still very relevant, reopening.
Some implementation details here: https://github.com/iterative/dvc/issues/3975#issuecomment-640815774
Hello! So, I think this is a very useful feature to implement, because it gives the possibility to use it for integrations in IDEs, which is my case. Since I'm developing a plugin that makes use of dvc, it would be very interesting to have the possibility to run the dvc status and create a file with the output as you do when creating dvc pipelines. I also think that you should use YAML (like you do in .dvc files) or JSON so that it can be easily parsed in different languages (for example, I am making my java plugin).
Hi @hdcasti !
Created https://github.com/iterative/dvc/pull/3998 that introduces --show-json for dvc status. Should get us going. This issue was also initially asking for things like locked, and we could do that on top if anyone requests it directly (just not too keen on adding stuff that there are no current users for).