Altair: Adding a bin shorthand

Created on 18 Jun 2019 · 15Comments · Source: altair-viz/altair

Would it be possible to add a shorthand syntax for binning? The exact proposal would be to to allow to replace

alt.Chart(cars).mark_bar().encode(
    x=alt.X("Horsepower", bin=True), 
    y="count()"
)

alt.Chart(cars).mark_bar().encode(
    x="bin(Horsepower)", 
    y="count()"
)

After reading through #763 I understand that changes in this direction are not trivial, but I would suppose that just allowing this simple substitution would not break the conceptual model, as one would still have to use alt.X to customize the binning. However, for the default case this substitution should correspond to what is already being done for aggregations.

enhancement

Source

JarnoRFB

Most helpful comment

How often do people use this?

Not much, if at all. We don't do any direct property access in Altair's docs or examples, so I think it's a pretty minor incompatibility all told. And my take is it's well worth the benefit of being able to do this:

chart.encode(
  x = alt.X('name:Q').bin(maxbins=100).axis(title=None).scale(None)
)

instead of this:

chart.encode(
  x = alt.X('name:Q',
    bin=alt.Bin(maxbins=100),
    axis=alt.Axis(title=None),
    scale=None
  )
)

jakevdp on 28 Jun 2019

👍7 ❤2

All 15 comments

It's an interesting idea, and I've thought about adding something like this in the past...

One thing I'm not excited about with this approach is that it makes the transition from default binning to configured binning a rather jarring one; you first do

x = 'bin(column)'

and then when you want to control the binnings, it has to be changed to

x=alt.X('column', bin=alt.Bin(maxbins=10))

Not a particularly seamless transition.

One alternative I've thought about is to use an approach similar to the vega-lite Javascript API and allows things like

alt.Chart().encode(
   alt.X('column').bin()
)

which could then be customized using

alt.X('column').bin(maxbins=10)

which would be much more natural. It would also remove the bin=alt.Bin boilerplate. We could do similar things with scale, axis, etc.

The problem with that idea is that currently alt.X.bin is a class attribute that is either Undefined, True, or an alt.Bin() object. Turning it into a callable method could break existing code. But it may be worth the breaking change to make encoding attributes more expressive.

jakevdp on 18 Jun 2019

I think the JS API looks very fluent. It is also nice that it takes the builder pattern that is already present at the Chart level, to the more lower levels of the API. I believe adding this kind of API to altair would greatly enhance the user experience.

The problem with that idea is that currently alt.X.bin is a class attribute that is either Undefined, True, or an alt.Bin() object. Turning it into a callable method could break existing code.

What about sticking even closer to the JS API and offering lower-case alternatives for the channel builders? So one could write

alt.Chart().encode(
   alt.x('column').bin(...).scale(...)
)

The, let's call them channel functions, and their methods would then always return a channel builder object.
Then just before the chart is rendered one would have to traverse the chart object and call build() on every builder, so the user does not really have to care about it.
This way one could put the new fluent channel API next to the existing one, without having to worry to much about breaking existing code.

JarnoRFB on 27 Jun 2019

I've been playing with this a bit... we might be able to fit this kind of new approach on top of the old one. Something like this (we'd need to adjust method signatures & docstrings, but that won't be too hard):

def propertysetter(name, cls):
  def setter(self, *args, **kwargs):
    if len(args) == 1 and len(kwargs) == 0:
      setattr(self, name, args[0])
    else:
      setattr(self, name, cls(*args, **kwargs))
    return self
  return setter

class X(alt.X):
  bin = propertysetter('bin', alt.Bin)
  axis = propertysetter('axis', alt.Axis)
  scale = propertysetter('scale', alt.Scale)

Then you could define a chart like this:

chart = alt.Chart('data.csv').mark_point().encode(
  x = X('name:Q').bin(maxbins=100).axis(title=None).scale(None)
)

chart.to_dict()
# {'$schema': 'https://vega.github.io/schema/vega-lite/v3.3.0.json',
#  'config': {'mark': {'tooltip': None}, 'view': {'height': 300, 'width': 400}},
#  'data': {'url': 'data.csv'},
#  'encoding': {'x': {'axis': {'title': None},
#    'bin': {'maxbins': 100},
#    'field': 'name',
#    'scale': None,
#    'type': 'quantitative'}},
#  'mark': 'point'}

The main backward incompatibility here is that you could no longer access attribues by their name; i.e. chart.encoding.x.bin would reference the setter function, not the bin value.

jakevdp on 28 Jun 2019

The main backward incompatibility here is that you could no longer access attribues by their name; i.e. chart.encoding.x.bin would reference the setter function, not the bin value.

How often do people use this? In JavaScript, we often solve this problem by using a function that is an accessor when you don't pass any arguments and a setter when you pass arguments. I don't know how pythonic this would be, though, and it also doesn't solve the backwards compatibility issue.

domoritz on 28 Jun 2019

How often do people use this?

chart.encode(
  x = alt.X('name:Q').bin(maxbins=100).axis(title=None).scale(None)
)

instead of this:

chart.encode(
  x = alt.X('name:Q',
    bin=alt.Bin(maxbins=100),
    axis=alt.Axis(title=None),
    scale=None
  )
)

jakevdp on 28 Jun 2019

👍7 ❤2

Just to make sure, autocomplete (in e.g. VSCode or JupyterLab) would work as expected here?

domoritz on 28 Jun 2019

Autocomplete would work for systems based on Jedi or similar (i.e. ones that can infer the return type of alt.X() in order to populate autocomplete for its methods.

jakevdp on 28 Jun 2019

The main backward incompatibility here is that you could no longer access attribues by their name; i.e. chart.encoding.x.bin would reference the setter function, not the bin value.

We can at least make chart.encoding.x.bin() return current bin value when there is no parameter specified? (This is what VL-API currently does.)

kanitw on 28 Jun 2019

We can at least make chart.encoding.x.bin() return current bin value when there is no parameter specified?

Yes, that should work. Although that might mess with the type inference engines behind autocomplete in the case of chained statements, because it means there's more logic required to infer the correct return type.

jakevdp on 29 Jun 2019

Although that might mess with the type inference engines behind autocomplete in the case of chained statements, because it means there's more logic required to infer the correct return type.

I see. I think good autocompletion is probably more important. I guess people can always do .to_dict()["bin"] to get the values or we can introduce .get_bin() or something.

kanitw on 30 Jun 2019

@jakevdp

Would this mean that sort, which has two class types (SortByEncoding and EncodingSortField), would support either .sort(encoding="x", order="ascending") or .sort(op=..., field=..., order=...) ?

kanitw on 1 Jul 2019

Would this mean that sort, which has two class types

Hmm... that's an interesting question. From the user's perspective, I think it would make the most sense to have a single method, with the return type chosen based on the content of the arguments.

Do you know offhand how the JS API handles this?

jakevdp on 1 Jul 2019

Since JS doesn't support keyword arguments and but support dict args, the JS API simply takes either .sort({encoding: 'x', order: 'ascending'}) or .sort({op: ..., field: ..., order: ...}).

Note that we're considering if sort should support channel name (https://github.com/vega/vega-lite/pull/5134), which is a bit relevant to this question about sort.

kanitw on 1 Jul 2019

Ah, I see.

Huge +1 on supporting channel name as well; we can make it all work from the Altair side.

jakevdp on 1 Jul 2019

Prototype & more discussion in #1629

jakevdp on 23 Nov 2019

Was this page helpful?

0 / 5 - 0 ratings