Vega-lite: A way to use argmin/argmax inside encoding

Created on 12 Mar 2018  路  15Comments  路  Source: vega/vega-lite

Argmin/argmax won't work in encoding as it involves two fields. We need better syntax for this.

Area - Data & Transform Enhancement

Most helpful comment

lol you beat me to it. :)

I've vl.argmin('Acceleration').field('Horsepower') in the comment box right here. :p

All 15 comments

Let's fix nested data instead and make it work?

It's not just that. You normally want to calculate argmin / argmax of one field, but refer to another field.

https://vega.github.io/vega/docs/transforms/flatten/ could be useful here. If we see that a field comes from argmin/argmx, we can flatten it.

That's insufficient. For argmin and argmax to be used in encoding, we need two fields -- one is the field you are optimizing, another is the field that you want the value from.

For example, we might need something like this syntax

x: {"aggregate": {"argmin": "a"}, "field": "b", ...} 

to encode the b value of the data point that has the min of a.

To provide more concrete spec

{
  "data": {"url": "data/cars.json"},
  "transform": [{
    "bin": true, "field": "Miles_per_Gallon", "as": "bin_mpg"
  },{
    "aggregate": [
      {"op": "argmax", "field": "Acceleration", "as": "argmax_acc"}
    ],
    "groupby": ["bin_mpg", "bin_mpg_end"]
  }],
  "mark": "point",
  "encoding": {
    "y": {"field": "argmax_acc.Horsepower","type": "quantitative"},
    "x": {"bin": "binned", "field": "bin_mpg","type": "quantitative"},
    "x2": {"field": "bin_mpg_end"}
  }
}

should become just:

{
  "data": {"url": "data/cars.json"},
  "mark": "point",
  "encoding": {
    "y": {"aggregate": {"argmax": "Acceleration"}, "field": "Horsepower", "type": "quantitative"},
    "x": {"bin": true, "field": "Miles_per_Gallon", "type": "quantitative"}
  }
}

@jheer @arvind @domoritz -- any opinion?

I think the suggestion makes a lot of sense.

This makes sense from a pure JSON perspective, but could be awkward from an Altair or Vega-Lite API perspective. In addition to some potential implementation headaches, it's pretty confusing what vl.argmin('Horsepower', 'Acceleration') means... any thoughts on what a proper API should look like, given that we already have op(encoding_field) as a standard representation?

I guess one idea would be to have argmin and argmax exist as operations with different semantics, such that vl.argmin('Acceleration').field('Horsepower') makes sense in a way that vl.sum('foo').field('foo') does not. I think that a separate argmin/argmax design would be easy enough to add to the Vega-Lite API, too.

lol you beat me to it. :)

I've vl.argmin('Acceleration').field('Horsepower') in the comment box right here. :p

Another question would be what would be an appropriate default title.

Currently I have Horsepower of argmin(Acceleration) for

vl.argmin('Acceleration').field('Horsepower') /
{"aggregate": {"argmin": "Acceleration"}, "field": "Horsepower", ...}

(But it's a bit awkward that we won't really use the form f(a) in other title by default elsewhere anymore.)

"Horsepower of the record with the smallest Acceleration" or something slightly shorter.

"Horsepower for minimum Acceleration"?

Something like "X corresponding to minimum/maximum Y" is perhaps clearer but likely too long.

"Horsepower for min Acceleration" vs "Horsepower for minimum Acceleration"?

Min is shorter and a bit consistent with min in the "argmin" op. We also say "Min of X" for min(x). But minimum is obviously clearer.

Right now I'm leaning towards just "min" because title shouldn't be long by default.

We now have a PR (https://github.com/vega/vega-lite/pull/4794.)

Let's discuss more about the title format in https://github.com/vega/vega-lite/pull/4794..

Was this page helpful?
0 / 5 - 0 ratings