Since Layer is the main interface, it would be nice if tab completion on the object only listed relevant pieces of the API so that you can quickly find what plot types are available (e.g. point(), bar(), text(), etc.)
Currently, since it derives from BaseObject the namespace is polluted with all sorts of traitlet stuff that the user probably doesn't care about.
I'd propose something like this:
class LayerObject(BaseObject):
# traitlet-related stuff goes here
def __init__(self, *args, **kwargs):
super(LayerObject, self).__init__(**kwargs)
# etc.
class Layer(object):
# non-traitlet-related Layer methods here
def __init__(self, *args, **kwargs):
if len(args)==1:
self.data = args[0]
self._layerobject = LayerObject(**kwargs)
def point(self):
self.mark = 'point'
return self
# etc.
The only problem would be if we ever want to pass Layer to some other class this would complicate things. What do you think?
What about defining a dir
Method?
Sent from my iPhone
On May 18, 2016, at 10:08 PM, Jake Vanderplas [email protected] wrote:
Since Layer is the main interface, it would be nice if tab completion on the object only listed relevant pieces of the API so that you can quickly find what plot types are available (e.g. point(), bar(), text(), etc.)
Currently, since it derives from BaseObject the namespace is polluted with all sorts of traitlet stuff that the user probably doesn't care about.
I'd propose something like this:
class LayerObject(BaseObject):
# traitlet-related stuff goes here
def init(self, _args, _kwargs):
super(LayerObject, self).init(*kwargs)# etc.class Layer(object):
# non-traitlet-related Layer methods here
def init(self, _args, _kwargs):
if len(args)==1:
self.data = args[0]
self._layerobject = LayerObject(*kwargs)def point(self): self.mark = 'point' return self # etc.The only problem would be if we ever want to pass Layer to some other class this would complicate things. What do you think?
โ
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
Ah โ that's a good idea! I'll try that.
Traitlets even offers a method for getting a list of all the traitlets attributes - we could combine that with a list of the public methods
Sent from my iPhone
On May 19, 2016, at 9:22 AM, Jake Vanderplas [email protected] wrote:
Ah โ that's a good idea! I'll try that.
โ
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
Another option: what if we change point(), bar(), etc. to draw_point(), draw_bar(), etc. Then users could tab-complete draw_<TAB> to see all the mark methods available.
Or maybe mark_* to be consistent with the VL naming of things ?
Sent from my iPhone
On May 19, 2016, at 9:30 AM, Jake Vanderplas [email protected] wrote:
Another option: what if we change point(), bar(), etc. to draw_point(), draw_circle(), draw_bar(), etc. Then users could tab-complete draw to see all the mark methods available.
โ
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
I have alternative syntax to propose here. I'm not married to this idea yet, but think it's worth considering.
Currently Altair rely on method chaining to describe mark.
.encode(x=X('Horsepower'), y=Y('Miles_per_Gallon')).point()
What if we make mark first parameter of .encode()?
.encode('point', X('Horsepower'), Y('Miles_per_Gallon'))
// or
.encode(point(), x=X('Horsepower'), y=Y('Miles_per_Gallon'))
Especially if we call the top-level object a View
View(data).encode('point', X('Horsepower'), Y('Miles_per_Gallon'))
I feel like this seems more natural, in a sense that we encode points by mapping Horsepower to X and Miles per Gallon to Y. (Not sure how much work would this introduce though.)
One of the core abstractions in altair is that everything is a strongly
typed attribute and that we only add extra methods that provide a
convenience layer on top one attribute at a time.
Thus, the Layer.encoding attribute has a encode method. The
Layer.config has a configure method. Putting the mark type into the
encoding confuses that model and also forces us to do more complex parsing
or positional args.
Lastly, tab completion of the API is pretty important as in the notebook it
is one of the main ways that users discover APIs. Thus, we are looking at
renaming the current mark methods to things like mark_line and
mark_point to allow users to type l.mark_<tab> to see all of the mark
types. Putting the mark as a positional inside the encode method makes
that entire part of the API much less discoverable.
On Thu, May 19, 2016 at 5:32 PM, Kanit Wongsuphasawat <
[email protected]> wrote:
I have alternative syntax to propose here. I'm not married to this idea
yet, but think it's worth considering.Currently Altair rely on method chaining to describe mark.
.encode(x=X('Horsepower'), y=Y('Miles_per_Gallon')).point()
What if we make mark first parameter of .encode()?
.encode('point', X('Horsepower'), Y('Miles_per_Gallon'))
// or
.encode(point(), x=X('Horsepower'), y=Y('Miles_per_Gallon'))Especially if we call the top-level object a View
View(data).encode('point', X('Horsepower'), Y('Miles_per_Gallon'))
I feel like this seems more natural, in a sense that we encode points by
mapping Horsepower to X and Miles per Gallon to Y. (Not sure how much
work would this introduce though.)โ
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/ellisonbg/altair/issues/93#issuecomment-220488589
Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
[email protected] and [email protected]
I make the mark_point change in https://github.com/ellisonbg/altair/commit/8a593b5a36f283a51b794fb570c887b4ebe18714
On the tab-completion question โ is there a way to make the traitlets keyword arguments show up in tab completion? That would be extremely helpful!
Lastly, tab completion of the API is pretty important as in the notebook it
is one of the main ways that users discover APIs. Thus, we are looking at
renaming the current mark methods to things likemark_lineand
mark_pointto allow users to typel.mark_<tab>to see all of the mark types.
I like the tab idea. So please ignore my comment above. :)
That said, I wonder why you have to put mark after encode in the examples?
For example,
View/Layer(data).encode(x=X('Horsepower'), y=Y('Miles_per_Gallon')).mark_point()
Why not:
View/Layer(data).mark_point().encode(x=X('Horsepower'), y=Y('Miles_per_Gallon'))
I think the latter is more consistent with how we write specs in Vega-Lite.
To be clear, either order works currently.
To be clear, either order works currently.
Yep. I mean, unless you have strong preference, can we put mark_point() before encode() in the all the example so we can establish convention? (For the same reason, in all VL examples, we put mark before encoding although order does not really matter in JSON. :) )
I don't feel too strongly about the ordering but like the idea of being consistent with the vegalite examples. So +1 to marking first
Sent from my iPhone
On May 20, 2016, at 9:44 AM, Kanit Wongsuphasawat [email protected] wrote:
To be clear, either order works currently.
Yep. I mean, unless you have strong preference, can we put mark_point() before encode() in the all the example so we can establish convention? (For the same reason, in all VL examples, we put mark before encoding although order does not really matter in JSON. :) )
โ
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
Somewhat related: how would you feel about the mark_*() methods take optional arguments that pass to MarkConfig?
For example, here's the current API for creating a red scatterplot:
Layer(data, config=Config(mark=MarkConfig(color='red'))).mark_circle(
).encode(
x='Horsepower',
y='Miles_per_Gallon',
)
Here's how we might do the same thing with mark keywords:
Layer(data).mark_circle(
color='red'
).encode(
x='Horsepower',
y='Miles_per_Gallon',
)
how would you feel about the mark_*() methods take optional arguments that pass to MarkConfig?
I would avoid adding syntax that diverge too much from Vega-Lite syntax because that would lead to confusion when you know one version and want to transition to use another.
The mark_*() functions already diverge from Vega-Lite syntax... if you want to stick to exact mirroring of Vega-Lite you can feel free to do Layer(data, mark='point', ...) instead. You could build an entire plot using the low-level object wrappers for the vega-lite spec, without ever calling any of these methods!
My feeling is that Vega-lite is a specification (not an API), while Altair is an API (not a specification), so we should feel free to make these sorts of choices to make life easier for the user.
Now you might ask why we have Layer().mark_point() at all, when Layer(mark='point') is almost identical as far as keystrokes and information content. In my mind, the first is much more usable than the second, because it allows the user to take advantage of IPython's tab-completion.
In the first, if you forget all the mark options you type Layer.mark_<TAB> and they will appear in a list. In the second, if you forget all the mark options, you have to stop what you're doing and go search for the right section of the code documentation. Our goal here should be to avoid the need for that type of context switching as much as possible.
My feeling is that Vega-lite is a specification (not an API), while Altair is an API (not a specification)
I don't think so, for direct Vega-Lite users, our syntax is our API as well.
Also eventually Altair code that calls Altair API will be a visualization specification as well (in form of Python) -- just like D3 code is a form of visualization specification.
So keeping similarity between the two projects would benefit both projects in longer term.
IMHO, we should design Altair in a way that we can write a function that converts VL JSON to Altair easily in the future and use this function to add an "Altair" tab to Vega-Lite docs in the future.
(For now, let's do docs in a notebook form first.)
Now you might ask why we have Layer().mark_point() at all
I actually don't feel against mark_point() because of tabbing reason like you just say. At least structurally they are at the same level of the syntax tree. (no matter Layer(mark='point') or Layer().mark_point() -- it's still a top level property.)
For config, this is a much bigger jump. ( config.mark.color becomes mark.color in the syntax tree in this proposal.)
Plus, there are many things under config and if only config.mark get shortcut like this (but not other type of config.*), it also introduces inconsistency that can be very confusing and would make it impossible to add an "Altair" tab to Vega-Lite docs in the future.
For color's case, you can actually do
{
...
encoding: {
color: {value: 'red'}
}
}
so I guess in Altair it could still be
.encode(Color(value='red'))
I guess one tricky part is that we can't do
.encode(color='red')
because we can't distinguish between red as a field or red as a string value.
If a Python version of Vega-lite is what you're after, why use Altair at all? You could define things as Python dictionaries, and use json.dumps directly.
All I know is that if I tell people that to make points red simply requires
Layer(config=Config(mark=MarkConfig(color='red'))).mark_circle()
they'll laugh and uninstall the package.
IMHO, we should design Altair in a way that we can write a function that converts VL JSON to Altair easily in the future and use this function to add an "Altair" tab to Vega-Lite docs in the future.
I'm working on this now โ it doesn't require the one-to-one mapping you have in mind.
Ok. Your point about
Layer(config=Config(mark=MarkConfig(color='red'))).mark_circle()
or slightly better but still very terrible
Layer().mark_circle().configure(mark=MarkConfig(color='red'))
is a very good one. But I don't know if the proposed solution above is the right one.
(Other types of config would still be super annoying for the same reason.)
Is there a way to solve this that works for other type of config as well?
(Btw, we have a Vega slack channel. If you want we can create Altair channel there so @jheer @ellisonbg and @domoritz can be in the same room.)
I imagine that as we hit other pain points in configuration, we can come up with more API solutions similar to mark_point(color='red'). I landed on this as I went through the basic examples and tried to implement them in Altair. I don't know what the right solution will be as we go deeper into configs, but I believe that API-wise, exactly mirroring the VegaLite spec should not be our preferred solution.
Maybe one way to have the best of both worlds is to make everything a top-level method. For example we could do something along the lines of
Layer().configure_mark_color('red')
which would be an alias of
Layer().configure_mark(color='red')
which would be an alaias of
Layer().configure(mark=MarkConfig(color='red'))
which would be an alias of
Layer(config=Config(mark=MarkConfig(color='red')))
A benefit would be that Layer.configure_<TAB> would give you all the available options โ we'd just have to adjust the completer to make sure things come in the order that's convenient to the users.
The comment above is very interesting. Let's see what others think.
If we do this alias thing, do we still do the mark_point(color='red') thing? (If we do both, are we introducing too many ways to do the same thing?)
Below the HR are my old reply before reading your latest comment
Ok. I look at a few other vis spec language and think that maybe mark_point(color='red') is what we can settle on.
Here the problem is we want to distinguish between mapping to field and mapping to value.
In ggplot, they use aes to for field-mapping as in geom_line(aes(group=1), colour="#000099") and property of the mark/geom for mapping to value.
Given discusion above, we can't do like ggplot here because it introduces type inconsistency, which will screw up autocompletion. Thus, this solution maybe the right one. :)
I imagine that as we hit pain points in configuration, we can come up with more API solutions similar to mark_point(color='red').
Maybe that's the way to go.
FYI, I look at our configs.
The only ones that we have to think about are
cell.width and height config is too deep in the syntax -- we probably want top level .width() and .height() syntax for Altair. In fact, I might introduce similar change in the JSON syntax, especially after we are done with https://github.com/vega/vega-lite/issues/194) . encoding.{x,y,color,...}.{scale,axis,legend} but we add this config more for theming. (Correction: but this is still useful if you want to customize both axes at the same time) mark_point() if we proceed down this route. OK โ interesting to hear that there are other places where Vega-Lite allows configurations in multiple places. It might make sense for the spec to move to doing that more: e.g. rather than mark being a string enum, make it a definition that includes configuration (that's essentially what we're bootstrapping here).
I think a good goal would be to make the config option more like a set of defaults for theming, with the goal that normal usage should never have to touch it directly. If you want the Vega-Lite spec to be an API, that could be very useful.
Another unrelated thought as you talk about the spec evolving: does the spec have any place to specify the version used? If not, it's going to be hard to let it evolve without breaking a lot of people's visualizations.
Thinking a bit more...
I like the idea of having a set of configure_* methods that provide a shorthand notation for configuring each section of the config:
configure_axis()
configure_cell()
configure_mark()
configure_facet()
configure_legend()
configure_scale()
So then you can do things like Chart(data).mark_point().encode().configure_mark(color='red'). The benefit of this approach is that we never have to worry about name collision in the different config sections with a syntax like Chart(data).mark_point().encode().configure(attr1='foo', attr2='bar').
However, for the case of the mark_* methods, because there is no ambiguity about which section of the config you are interested, I would also support the mark_point(color='red') syntax. It will really help users and there is no risk of name collision.
I am not very fond of the Layer().configure_mark_color('red') syntax as it starts to hide the attribute based approach of altair a bit too much.
interesting to hear that there are other places where Vega-Lite allows configurations in multiple places.
Yes. To be fair, I was a little inaccurate last night. Scale config is really for theming. (But when you set scale/axis/legend config, it applies for scale/axis/legend of every applicable channel.)
It might make sense for the spec to move to doing that more: e.g. rather than mark being a string enum, make it a definition that includes configuration (that's essentially what we're bootstrapping here).
For Vega-Lite, I don't think we will do it. mark in Vega-Lite is just mark type. And encoding is for mapping either field or value to the channel (mostly visual properties) of the marks. Plus, config.mark isn't that bad to specify in JSON form. Not to mention the backward compatibility issue.
I think a good goal would be to make the config option more like a set of defaults for theming, with the goal that normal usage should never have to touch it directly. If you want the Vega-Lite spec to be an API, that could be very useful.
Right now it's sort of like that. Many mark config (that are properties of the marks) will eventually become encoding channel and support mapping to field -- but that requires implementing default scale range, and legend support. Since the most useful channels are already supported, we do not plan to implement this yet and this might happen over time.
But I guess we won't go to an extreme point where normal usage should "never" have to touch it directly (but yes, probably "rarely"). For example, some rarely used configs (e.g., config.mark.stacked) don't really have better place to be.
Another unrelated thought as you talk about the spec evolving: does the spec have any place to specify the version used? If not, it's going to be hard to let it evolve without breaking a lot of people's visualizations.
Having a version would be useful. Right now we don't have it yet but we discussed about it in https://github.com/vega/vega/issues/448.
In any case, we will mostly add new properties more than removing. In general, we consider multiple alternative syntaxes (esp. the core part) before deciding on a syntax so this help avoid deprecating properties / major restructuring to support backward compatibility. But as you know, we can't 100% prevent this, so we definitely should add versioning in some form.
However, for the case of the mark_* methods, because there is no ambiguity about which section of the config you are interested, I would also support the mark_point(color='red') syntax. It will really help users and there is no risk of name collision.
okay.
I am not very fond of the
Layer().configure_mark_color('red')syntax as it starts to hide the attribute based approach of altair a bit too much.
I agree that this should not be shown in examples (as the recommended way).
But I guess this would come for free from auto-generation? If preventing it is more work, I'm okay with having it. (I'm fine either way.)
Note that config.facet has one level deeper of properties. So there should still be config_facet_axis(), config_facet_cell(), config_facet_grid(), config_facet_scale().
Let's make a decision on these points:
Layer -> Chart?configure_* methods?I am +1 on both...
+1 if we're talking about the one-level-deep options... Not things like configure_mark_color().
OK, I have to shift back to other stuff, feel free to work on this stuff...
- Should we rename Layer -> Chart?
- Should we implement the extra configure_* methods?
+1 too.
+1 if we're talking about the one-level-deep options... Not things like configure_mark_color().
I agree. configure_mark_color() is not good.
I guess what we actually don't want are methods for the leaves of the config tree?
We probably still want configure_facet_scale(bandSize=...) which is two-level-deep but its leaves are at level-three.
This is done in master...
Most helpful comment
I don't feel too strongly about the ordering but like the idea of being consistent with the vegalite examples. So +1 to marking first
Sent from my iPhone