Our default plot is not very informative when the number of data points is huge (1k):
We need to introduce some smoothing into the default plot or create a new smoothing template. The result should look like:
PS: I've done the last plot by adding loess
transformation to the template which breaks the revision. We need to fix the revision or find another solution.
"transform": [{
"loess": "<DVC_METRIC_Y>",
"on": "<DVC_METRIC_X>"
}
],
So, I see a few ways how we can approach that:
default_smoot
or something like that - easy, but if we decide to go that way, soon we will have many plot templates, which is probably not desired.update_dict
parameter, that would be a way to pass any desired modification, that user would like to made to existing plot. So in our case, entry for plot would probably look something like:copy3:
cmd: cat data >> result3
deps:
- data
plots:
- result3:
update_dict:
- transform:
- loess: <DVC_METRIC_Y>
- on: <DVC_METRIC_X>
The cons:
plots
command that would make this easier.diff
already supports plotting multiple targets, and we have --show-vega
option, I am not sure we need them anymore. The pros:
vega.js
knowledge, of course.The 2nd approach makes sense for many different transformations. However, we should not use it as a default approach for smoothing since smoothing is a super common use case and we don't want to complicate the pipeline every time.
If it possible I'd add the smoothing in the default one. If not possible - create a new template.
@dmpetrov Tried adding smoothing by default, but "normal" use cases can get ugly. So I guess we should go with new template.
@pared could you please share the result with code? Were you able to solve the revision issue?
@dmpetrov sure!
So, the thing that we had to do here was add groupby
to our transform:
"transform": [
{
"loess": "<DVC_METRIC_Y>",
"on": "<DVC_METRIC_X>",
"groupby": ["rev"]
}
]
And the results for "normal" use case looks as follows (example have 100 points, evenly spaced):
and with loess applied:
@dmpetrov also, regarding the update_params
: I agree we should implement it at some point, I could totally imagine, user preserving the original plot AND applying some transformation, just like in this example.
@dmpetrov
I played around with smoothing values and here are results for different bandwidth
values:
0.05
0.1
0.2
It seems to me that setting default smoothing to very small value is a reasonable decision for the default plot, 0.01
case seems to affect only plot with really big amount of points. It does not require too much additional work from us, and we can deal with update_dict
later. If we agree on that, I would mention the smoothing in docs.
To make the comparison easier I created new plots out of 12500
point case, by extracting particular amount of points from the dataset, evenly spaced. Number of were 1250, 125, 25:
0.01
0.05
0.1
0.2
- Create new template, default_smoot or something like that - easy, but if we decide to go that way, soon we will have many plot templates
A smooth
template is not a bad idea TBH. Quick and easy, we just document the template (e.g. link here) and emphasize that it can be customized to change the parameters.
- Since we already support plot definition in pipelines file, maybe we should consider introducing update_dict parameter, that would be a way to pass any desired modification
I don't understand why update_dict
is needed in the YAML structure but anyway, if this means dvc plots modify
will work more like dvc config
accepting key:value pairs to set the display props (instead of having a specific -- flag per supported prop, which is a little confusing and limiting), I support this route.
agree with @jorgeorpinel about making plots modify
more like config
. It was a small experiment with plots modify
in an attempt to possibly make it better than metrics modify
that we used to have, but it feels more confusing indeed.
Per https://github.com/iterative/dvc/issues/3906#issuecomment-639735956 sounds like we're going with a template for this one, so should we extract the part about making plots modify more like config? Pawel suggested I commented here but I'd be happy to create a separate ticket.
@jorgeorpinel & @efiop yes, it has to be configurable (otherwise we need dozens of the templates :) ).
We already support dvc plots modify
. The smoothing option should be part of this.
Most helpful comment
A
smooth
template is not a bad idea TBH. Quick and easy, we just document the template (e.g. link here) and emphasize that it can be customized to change the parameters.I don't understand why
update_dict
is needed in the YAML structure but anyway, if this meansdvc plots modify
will work more likedvc config
accepting key:value pairs to set the display props (instead of having a specific -- flag per supported prop, which is a little confusing and limiting), I support this route.