Dvc: plots smoothing

Created on 29 May 2020  路  12Comments  路  Source: iterative/dvc

Our default plot is not very informative when the number of data points is huge (1k):
visualization_

We need to introduce some smoothing into the default plot or create a new smoothing template. The result should look like:
visualization (2)

PS: I've done the last plot by adding loess transformation to the template which breaks the revision. We need to fix the revision or find another solution.

    "transform": [{
        "loess": "<DVC_METRIC_Y>",
        "on": "<DVC_METRIC_X>"
    }
    ],
feature request

Most helpful comment

  1. Create new template, default_smoot or something like that - easy, but if we decide to go that way, soon we will have many plot templates

A smooth template is not a bad idea TBH. Quick and easy, we just document the template (e.g. link here) and emphasize that it can be customized to change the parameters.

  1. Since we already support plot definition in pipelines file, maybe we should consider introducing update_dict parameter, that would be a way to pass any desired modification

I don't understand why update_dict is needed in the YAML structure but anyway, if this means dvc plots modify will work more like dvc config accepting key:value pairs to set the display props (instead of having a specific -- flag per supported prop, which is a little confusing and limiting), I support this route.

All 12 comments

So, I see a few ways how we can approach that:

  1. Create new template, default_smoot or something like that - easy, but if we decide to go that way, soon we will have many plot templates, which is probably not desired.
  1. Since we already support plot definition in pipelines file, maybe we should consider introducing update_dict parameter, that would be a way to pass any desired modification, that user would like to made to existing plot. So in our case, entry for plot would probably look something like:
copy3:
    cmd: cat data >> result3
    deps:
    - data
    plots:
    - result3:
        update_dict:
        - transform: 
              - loess: <DVC_METRIC_Y>
              - on: <DVC_METRIC_X>

The cons:

  • in that case the user needs to come up with JSON update and transform it to yaml (since we work with Vega, I would assume that this would be the default way to go -> find a solution in vega -> apply to dvc to preserve it). That is not very convenient. Maybe we could add another plots command that would make this easier.
  • if we decide to go this way, I would reconsider using anchors. (DVC_METRIC_Y, DVC_METRIC_X, and others). In the proposed solution user needs to learn vega -> learn how to transform -> create transformation -> convert to yaml -> learn what anchors are -> replace x/y witch anchors -> write to yaml file.
    Anchors were introduced to let the user create their own HTML templates, but since diff already supports plotting multiple targets, and we have --show-vega option, I am not sure we need them anymore.

The pros:

  • this approach lets user modify the template in any way he/she wants, so basing on default template, user can modify the title, width,height, whatever comes to mind. Assuming certain level of vega.js knowledge, of course.

The 2nd approach makes sense for many different transformations. However, we should not use it as a default approach for smoothing since smoothing is a super common use case and we don't want to complicate the pipeline every time.

If it possible I'd add the smoothing in the default one. If not possible - create a new template.

@dmpetrov Tried adding smoothing by default, but "normal" use cases can get ugly. So I guess we should go with new template.

@pared could you please share the result with code? Were you able to solve the revision issue?

@dmpetrov sure!
So, the thing that we had to do here was add groupby to our transform:

"transform": [
{
        "loess": "<DVC_METRIC_Y>",
        "on": "<DVC_METRIC_X>",
    "groupby": ["rev"]
}
]

And the results for "normal" use case looks as follows (example have 100 points, evenly spaced):

visualization (1)

and with loess applied:
visualization

@dmpetrov also, regarding the update_params: I agree we should implement it at some point, I could totally imagine, user preserving the original plot AND applying some transformation, just like in this example.

@dmpetrov
I played around with smoothing values and here are results for different bandwidth values:

  • 0.01
    001
  • 0.05
    005

  • 0.1
    01

  • 0.2
    02

It seems to me that setting default smoothing to very small value is a reasonable decision for the default plot, 0.01 case seems to affect only plot with really big amount of points. It does not require too much additional work from us, and we can deal with update_dict later. If we agree on that, I would mention the smoothing in docs.

To make the comparison easier I created new plots out of 12500 point case, by extracting particular amount of points from the dataset, evenly spaced. Number of were 1250, 125, 25:

  • 0.01
    new_001

  • 0.05
    new_005

  • 0.1
    new_01

  • 0.2
    new_02

  1. Create new template, default_smoot or something like that - easy, but if we decide to go that way, soon we will have many plot templates

A smooth template is not a bad idea TBH. Quick and easy, we just document the template (e.g. link here) and emphasize that it can be customized to change the parameters.

  1. Since we already support plot definition in pipelines file, maybe we should consider introducing update_dict parameter, that would be a way to pass any desired modification

I don't understand why update_dict is needed in the YAML structure but anyway, if this means dvc plots modify will work more like dvc config accepting key:value pairs to set the display props (instead of having a specific -- flag per supported prop, which is a little confusing and limiting), I support this route.

agree with @jorgeorpinel about making plots modify more like config. It was a small experiment with plots modify in an attempt to possibly make it better than metrics modify that we used to have, but it feels more confusing indeed.

Per https://github.com/iterative/dvc/issues/3906#issuecomment-639735956 sounds like we're going with a template for this one, so should we extract the part about making plots modify more like config? Pawel suggested I commented here but I'd be happy to create a separate ticket.

@jorgeorpinel & @efiop yes, it has to be configurable (otherwise we need dozens of the templates :) ).

We already support dvc plots modify. The smoothing option should be part of this.

Was this page helpful?
0 / 5 - 0 ratings