Edit: preview for Dataset and DataArray (pure html/css)
Dataset: https://jsfiddle.net/tay08cn9/4/
DataArray: https://jsfiddle.net/43z4v2wt/9/
I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks.
Here are some ideas for Dataset: https://jsfiddle.net/9ab4c3tr/35/
Some notes:
pandas.Dataframe repr as it is shown in jupyterlab.It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not!
OMG this is so cool!
OMG indeed this looks fantastic! We should test this on more examples, but I already love it. The underline is a great way to illustrate which variables are dimensions.
I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript)
Agreed. Notebook viewers tend to sanitize out JavaScript, especially if originating from an unknown source.
highlighting on hover a specific dimension simultaneously at several places of the repr
This would also be awesome, if possible with pure CSS. I don't know, but I'm going to bug my colleague who does JavaScript visualization.
OK, so bad news is that it not possible to select elements other than siblings or descendant tags with CSS. So selecting "cousin" tags like a dimension name at multiple locations in the repr is out.
But we could potentially add JavaScript for fancy hover effects. Even if it gets stripped out in many cases (for untrusted notebooks), it should degrade gracefully to the HTML only repr. My main concern would annoying prompts in the notebook interface that ask a user if they want to trust outputs or not.
Wow, great job @benbovy!
With the upcoming move towards Jupyter Lab and a better infrastructure for custom plugins, could this serve as the basis for a "NetCDF Extension" for Jupyter Lab? It would be great if double clicking on a NetCDF file in the JLab file explorer could open up this sort of information, or even a quick and dirty ncview-like plotter.
Thanks for the feedback!
Here are a few more ideas: https://jsfiddle.net/9ab4c3tr/48/
We might use drop-downs to display other useful information as well, such as the type of array (e.g., dask-array, in-memory numpy.array, etc...)
Fancy hover effects would be awesome indeed, although my concern is that too much hover effect would be a source of distraction.
A jupyterlab NetCDF viewer extension would be awesome too! It might also leverage phosphor's datagrid (https://github.com/phosphorjs/phosphor/issues/285) to explore the raw data values.
Although this could clearly be made more fancy and complicated, I think
what you have here would already be a great addition. I love the drop-down
attributes (those are missing from the standard repr).
Don't hesitate to start a PR! More fanciness could always be added in the
future based on user feedback.
On Thu, Oct 12, 2017 at 9:01 AM, Benoit Bovy notifications@github.com
wrote:
Thanks for the feedback!
Here is a few more ideas: https://jsfiddle.net/9ab4c3tr/48/
- main sections titles are colored so that these are more detached
from the content (not sure I really like it, though),- subtle shade variations, notably for displaying the first values for
each variable,- drop-downs for displaying attributes per variable if any (collapsed
by default).- hover-effect for dimensions: highlight all variables having the
hovered dimension (uses Javascript)We might use drop-downs to display other useful information as well, such
as the type of array (e.g., dask-array, in-memory numpy.array, etc...)Fancy hover effects would be awesome indeed, although my concern is that
too much hover effect would be a source of distraction.A jupyterlab NetCDF viewer extension would be awesome too! It might also
leverage phosphor's datagrid (phosphorjs/phosphor#285
https://github.com/phosphorjs/phosphor/issues/285) to explore the raw
data values.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pydata/xarray/issues/1627#issuecomment-336127376, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABJFJrhyjDbEVEz5AqUN6l2ppExwE3xiks5srg2MgaJpZM4P2LOr
.
As long as we're nit-picking style....
main section titles are colored so that these are more detached from the content (not sure I really like it, though),
I slightly prefer the grayscale you had before -- the section titles are already well detached.
subtle shade variations, notably for displaying the first values for each variable,
Love it!
drop-downs for displaying attributes per variable if any (collapsed by default).
It is wonderful to expose this information! I have a slight concern that the + icons could be confusing with the * we use for index variables in the normal repr. I don't have any good ideas for fixing this yet.
hover-effect for dimensions: highlight all variables having the hovered dimension (uses Javascript)
For highlighting, maybe something a little more subtle would work well, either:
Before starting a PR on this, I'd like to get the design right (at least the static part) and also to clarify the repr of other xarray objects such as DataArray, Variable, Dataset.coords and Dataset.data_vars.
For Dataset.coords and Dataset.data_vars it would be pretty straightforward (we may simply display the corresponding sections in Dataset html repr).
For DataArray and Variable I have still no clear idea on what would be a good representation for the data values:
Option A: embed the plain-text representation of the wrapped numpy.array (or dask.array) in a HTML container (https://jsfiddle.net/43z4v2wt/1/). The result is not that bad, although monospace and sans-serif fonts do not mix very well IMO.
Option B: a html-formatted table for data values as well. That would be tricky for >2d arrays, it has already been discussed in other issues without real consensus.
have a slight concern that the + icons could be confusing with the * we use for index variables in the normal repr.
Maybe putting the + icon more to the left would do the trick (https://jsfiddle.net/9ab4c3tr/49/).
For highlighting, maybe something a little more subtle would work well
Changing font-weight instead of background-color works better IMO (same link above).
Option A: embed the plain-text representation of the wrapped numpy.array (or dask.array) in a HTML container (https://jsfiddle.net/43z4v2wt/1/). The result is not that bad, although monospace and sans-serif fonts do not mix very well IMO.
I think this is probably our best option. My main suggestion is it should be possible to click somewhere (maybe on top xarray.DataArray header) to hide/show the data section.
Maybe putting the + icon more to the left would do the trick (https://jsfiddle.net/9ab4c3tr/49/).
Now it just looks a little out of place :). I liked it better closer to the variable name.
Changing font-weight instead of background-color works better IMO (same link above).
Hmm, I'm not sure. I find text that moves very distracting -- I like changing background colors.
My main suggestion is it should be possible to click somewhere (maybe on top xarray.DataArray header) to hide/show the data section.
See https://jsfiddle.net/43z4v2wt/3/
I find text that moves very distracting
Indeed, I agree now, it's annoying.
One more trick we could add for the Dataset repr:
When hovering over data, show repr for the full array values (as a numpy array) in a box (similar to "title" text, but ideally fixed-width format).
That would be nice, although I guess that it would require javascript.
Alternatively, we can imagine two drop-downs per variable, one for the attributes and one for the full array values (numpy or dask repr). Each would be shown/hidden by two distinct symbols or links, perhaps both located on the right after the preview-values so that we avoid confusion with symbol * in the normal repr.
In the DataArray repr, instead of completely collapsing the data section we may reduce it into a one-line preview of the first values. We can even imagine showing the reduced version by default in cases where the full data section would take too much vertical space.
A comment regarding the use of javascript: after reading this discussion (https://github.com/bokeh/bokeh/issues/6700) I'm not sure that maintaining a javascript-based repr that works with both jupyterlab and the classic notebook is worth the few fancy features it would provide.
I'd rather stick with a pure html/css solution here. It might still co-exist with a full-featured jupyterlab extension for viewing NetCDF files.
Alternatively, we can imagine two drop-downs per variable, one for the attributes and one for the full array values (numpy or dask repr). Each would be shown/hidden by two distinct symbols or links, perhaps both located on the right after the preview-values so that we avoid confusion with symbol * in the normal repr.
Yes, I like that. Possibly clicking on the array values would show the larger preview.
I did play around a little with a hover drop down, but the positioning is a little hacky:
https://jsfiddle.net/gux879hn/2/
Note that the info() method was added a while ago: https://github.com/pydata/xarray/pull/1176
At least in the notebook, this change would make it obsolete.
New version for Dataset: https://jsfiddle.net/9ab4c3tr/50/
A symbol shows its attributes..data repr And for DataArray: https://jsfiddle.net/43z4v2wt/5/ (possibility to reduce the full data repr to a one-line preview)
This is wonderful!
Is it reasonable to add a drop down in the attribute section if there are too many items there?
My data sometimes have a long list of attributes and important information about coordinate and data_vars are not seen without scrolling back the page.
(This may be a rare case though...)
Is it reasonable to add a drop down in the attribute section if there are too many items there?
In the examples above, you can click on the Attributes section title to collapse the whole section (same with coordinates and data variables sections).
Given your case, maybe it would be nice to add a rule to show the attribute section collapsed when the number of attributes is too long. If we allow that, then maybe it would be nice to also show the number of items in the section titles, e.g., ► Attributes (10):
I have a slight proposed tweak on the Dataset repr (https://jsfiddle.net/jrot9pex/1/). It makes two changes:
Note that the info() method was added a while ago: #1176
At least in the notebook, this change would make it obsolete.
Yes, but let's keep it -- it have a nice pure text format (from ncdump) which works especially well with copy & paste.
Given your case, maybe it would be nice to add a rule to show the attribute section collapsed when the number of attributes is too long. If we allow that, then maybe it would be nice to also show the number of items in the section titles, e.g.,
► Attributes (10):
Yes, I like this general idea -- though we might only show the number when it is collapsed, e.g., ► Attributes: (10)
If we allow that, then maybe it would be nice to also show the number of items in the section titles, e.g., ► Attributes (10):
I like this idea. Looking forward to having it.
though we might only show the number when it is collapsed
Good idea!
Changes the letter "A" to "a" for attributes
Agreed.
Moves the marker "a" closer to the variable name
The good (or bad?) thing with the marker "a" to the very left is that it is aligned with the drop-down symbols of the main sections, but to me either way is fine.
Let's summarize all suggestions so far:
Dataset: https://jsfiddle.net/tay08cn9/2/
DataArray: https://jsfiddle.net/43z4v2wt/7/
I think that we're getting close to something good!
I'll wait a bit before starting a PR (maybe sometime next week), in case other feedbacks or suggestions come up.
OMG this is so cool!
ditto wow I can't wait for this to be in!
@benbovy it seems like that discussion has stalled out... are you ready to put together that PR? :)
Personally, I'm very happy with your current version. You might even convince me to hold off on the v0.10 release to include it!
Sorry for my late reply @shoyer !
Yes I'm happy with this version too, I'll open asap a PR! Unfortunately I hadn't much time to work on this these last two weeks, but I want to put this in my priorities next week.
I guess that it is a bit late for v0.10 release which is already on track? I still don't exactly know how much time I'll need to implement this. I haven't thought yet about all implementation details (e.g., how to calculate the width of the variable name column? potential issues with jupyter notebook / jupyterlab...). It might also require some refactoring and/or new public API (e.g., to_html like in pandas).
I guess that it is a bit late for v0.10 release which is already on track?
Yes, probably at this point. But hopefully we can do a shorter release cycle for v0.11. Also, though this is a big visual change, I'm not sure it's actually a breaking change, per se. Only notebook output will change, not programmatic use of repr().
Also, though this is a big visual change, I'm not sure it's actually a breaking change, per se.
Yes, but once it's there I'm sure that we'll get bug reports right away if it doesn't work well, because this is a very visual change. Therefore I am almost in favor of pushing this forward as fast as possible and see how it goes, maybe followed by a quick 0.10.1
But since I'm not the one doing the job I'm not allowed to say anything more here ;)
I started working on a new PR (not yet submitted), but before continuing the work I really need to know how best we can include the CSS code in rich outputs for the notebook (e.g., inline CSS or using the <style> tag) without requiring any extension for the various front-end applications.
It is not really clear to me if it is even supported so I opened an issue on the jupyterlab side (see reference above).
So we have different options regarding the tools to use for implementing these rich representations:
Using vdom. This option is very pythonic and is suggested by jupyterlab and nteract developers. I have used vdom to implement the rich repr of Dataset (not fully working yet + you need last jupyterlab or nteract to see it): https://gist.github.com/benbovy/a30f286f7fdf9528c4d0c7980be9b6a7. vdom is still in development, though. It is not yet supported by all front-ends. Currently it is supported in jupyterlab and nteract and support will be added soon for the classic notebook (and nbviewer?)
Using a template system like jinja2. It doesn't require any specific support on the various notebook front-ends, but it still adds a dependency.
Using Python strings formatting. No dependencies, but more tough to maintain.
I'd like to know what are your thoughts. Should we go with vdom and wait a bit until the tool is more mature / more front-ends are supported (hopefully soon, which let us some time if we can wait for release v0.11)? Or do you want it right now and so it's perhaps better to use templates / formatting?
I am OK adding new (optional) Python dependencies like vdom or Jinja2 if that makes implementing and maintaining this easier.
If we can solve the problem of generating HTML from vdom as a fallback (https://github.com/nteract/vdom/issues/43) and the vdom developers are supportive for various issues that come up, that could easily be the best option.
I would be reluctant to only support vdom output, because there are a lot of legacy notebook viewing interfaces (including various IDEs, cloud hosted notebook environments and rendering on GitHub) that could take a while (years?) to support it.
The other thing to watch out for is if the vdom Python API is still immature and likely to lead to additional works when it changes in the future. This is somewhat of a judgment call. My sense is that they are taking a careful design to the project but it is still in the early days so it's hard to say for sure.
Yes good points.
One part of me (the one that loves every cool, new package) says "let's use vdom for this right now", and the other (more wise) part says "let's use Python string formatting -- not even sure we need something like jinja2 --, we'll switch to vdom once it is more mature and it will be quite easy to do so."
@benbovy - this came up in conversation today with @shoyer and a number of Jupyter devs. What is the current interest on the subject?
Yeah I really need to continue the work in #1820, this PR has stalled for too long!
Last time I worked on this I was struggling a bit on good column auto-sizing and alignment with a pure CSS implementation (i.e., using CSS grid, display: content), but that's not really a blocker I think.
Another option would be to write a jupyterlab mime render extension. This narrows the supported front-ends, but I guess the issue of front-end theme integration will be easier to solve. Also, it will be possible to add more fancy features later.
Did you were at SciPy? I hope you had a great time! I wanted to attend the conference this year but finally I couldn't make it, unfortunately.
Last time I worked on this I was struggling a bit on good column auto-sizing and alignment with a pure CSS implementation...but that's not really a blocker I think.
Agreed. I think that is something we can work on over time. A jupyterlab extension would be cool too but, as you say, it would have a smaller footprint in the shor-term.
Also, yeah, Scipy was great. I think I'll go back.
Thought I'd bump this (hopefully no one minds). I think that this is great!
🎉 🍰 🍾 🏆 🏅
Most helpful comment
Thought I'd bump this (hopefully no one minds). I think that this is great!