Would it be possible to add a shorthand syntax for binning? The exact proposal would be to to allow to replace
alt.Chart(cars).mark_bar().encode(
x=alt.X("Horsepower", bin=True),
y="count()"
)
by
alt.Chart(cars).mark_bar().encode(
x="bin(Horsepower)",
y="count()"
)
After reading through #763 I understand that changes in this direction are not trivial, but I would suppose that just allowing this simple substitution would not break the conceptual model, as one would still have to use alt.X to customize the binning. However, for the default case this substitution should correspond to what is already being done for aggregations.
It's an interesting idea, and I've thought about adding something like this in the past...
One thing I'm not excited about with this approach is that it makes the transition from default binning to configured binning a rather jarring one; you first do
x = 'bin(column)'
and then when you want to control the binnings, it has to be changed to
x=alt.X('column', bin=alt.Bin(maxbins=10))
Not a particularly seamless transition.
One alternative I've thought about is to use an approach similar to the vega-lite Javascript API and allows things like
alt.Chart().encode(
alt.X('column').bin()
)
which could then be customized using
alt.X('column').bin(maxbins=10)
which would be much more natural. It would also remove the bin=alt.Bin boilerplate. We could do similar things with scale, axis, etc.
The problem with that idea is that currently alt.X.bin is a class attribute that is either Undefined, True, or an alt.Bin() object. Turning it into a callable method could break existing code. But it may be worth the breaking change to make encoding attributes more expressive.
I think the JS API looks very fluent. It is also nice that it takes the builder pattern that is already present at the Chart level, to the more lower levels of the API. I believe adding this kind of API to altair would greatly enhance the user experience.
The problem with that idea is that currently alt.X.bin is a class attribute that is either Undefined, True, or an alt.Bin() object. Turning it into a callable method could break existing code.
What about sticking even closer to the JS API and offering lower-case alternatives for the channel builders? So one could write
alt.Chart().encode(
alt.x('column').bin(...).scale(...)
)
The, let's call them channel functions, and their methods would then always return a channel builder object.
Then just before the chart is rendered one would have to traverse the chart object and call build() on every builder, so the user does not really have to care about it.
This way one could put the new fluent channel API next to the existing one, without having to worry to much about breaking existing code.
I've been playing with this a bit... we might be able to fit this kind of new approach on top of the old one. Something like this (we'd need to adjust method signatures & docstrings, but that won't be too hard):
def propertysetter(name, cls):
def setter(self, *args, **kwargs):
if len(args) == 1 and len(kwargs) == 0:
setattr(self, name, args[0])
else:
setattr(self, name, cls(*args, **kwargs))
return self
return setter
class X(alt.X):
bin = propertysetter('bin', alt.Bin)
axis = propertysetter('axis', alt.Axis)
scale = propertysetter('scale', alt.Scale)
Then you could define a chart like this:
chart = alt.Chart('data.csv').mark_point().encode(
x = X('name:Q').bin(maxbins=100).axis(title=None).scale(None)
)
chart.to_dict()
# {'$schema': 'https://vega.github.io/schema/vega-lite/v3.3.0.json',
# 'config': {'mark': {'tooltip': None}, 'view': {'height': 300, 'width': 400}},
# 'data': {'url': 'data.csv'},
# 'encoding': {'x': {'axis': {'title': None},
# 'bin': {'maxbins': 100},
# 'field': 'name',
# 'scale': None,
# 'type': 'quantitative'}},
# 'mark': 'point'}
The main backward incompatibility here is that you could no longer access attribues by their name; i.e. chart.encoding.x.bin would reference the setter function, not the bin value.
The main backward incompatibility here is that you could no longer access attribues by their name; i.e. chart.encoding.x.bin would reference the setter function, not the bin value.
How often do people use this? In JavaScript, we often solve this problem by using a function that is an accessor when you don't pass any arguments and a setter when you pass arguments. I don't know how pythonic this would be, though, and it also doesn't solve the backwards compatibility issue.
How often do people use this?
Not much, if at all. We don't do any direct property access in Altair's docs or examples, so I think it's a pretty minor incompatibility all told. And my take is it's well worth the benefit of being able to do this:
chart.encode(
x = alt.X('name:Q').bin(maxbins=100).axis(title=None).scale(None)
)
instead of this:
chart.encode(
x = alt.X('name:Q',
bin=alt.Bin(maxbins=100),
axis=alt.Axis(title=None),
scale=None
)
)
Just to make sure, autocomplete (in e.g. VSCode or JupyterLab) would work as expected here?
Autocomplete would work for systems based on Jedi or similar (i.e. ones that can infer the return type of alt.X() in order to populate autocomplete for its methods.
The main backward incompatibility here is that you could no longer access attribues by their name; i.e. chart.encoding.x.bin would reference the setter function, not the bin value.
We can at least make chart.encoding.x.bin() return current bin value when there is no parameter specified? (This is what VL-API currently does.)
We can at least make chart.encoding.x.bin() return current bin value when there is no parameter specified?
Yes, that should work. Although that might mess with the type inference engines behind autocomplete in the case of chained statements, because it means there's more logic required to infer the correct return type.
Although that might mess with the type inference engines behind autocomplete in the case of chained statements, because it means there's more logic required to infer the correct return type.
I see. I think good autocompletion is probably more important. I guess people can always do .to_dict()["bin"] to get the values or we can introduce .get_bin() or something.
@jakevdp
Would this mean that sort, which has two class types (SortByEncoding and EncodingSortField), would support either .sort(encoding="x", order="ascending") or .sort(op=..., field=..., order=...) ?
Would this mean that sort, which has two class types
Hmm... that's an interesting question. From the user's perspective, I think it would make the most sense to have a single method, with the return type chosen based on the content of the arguments.
Do you know offhand how the JS API handles this?
Since JS doesn't support keyword arguments and but support dict args, the JS API simply takes either .sort({encoding: 'x', order: 'ascending'}) or .sort({op: ..., field: ..., order: ...}).
Note that we're considering if sort should support channel name (https://github.com/vega/vega-lite/pull/5134), which is a bit relevant to this question about sort.
Ah, I see.
Huge +1 on supporting channel name as well; we can make it all work from the Altair side.
Prototype & more discussion in #1629
Most helpful comment
Not much, if at all. We don't do any direct property access in Altair's docs or examples, so I think it's a pretty minor incompatibility all told. And my take is it's well worth the benefit of being able to do this:
instead of this: