Xarray: html repr of xarray object (for the notebook)

Created on 11 Oct 2017 · 39Comments · Source: pydata/xarray

Edit: preview for Dataset and DataArray (pure html/css)

Dataset: https://jsfiddle.net/tay08cn9/4/
DataArray: https://jsfiddle.net/43z4v2wt/9/

I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks.

Here are some ideas for Dataset: https://jsfiddle.net/9ab4c3tr/35/

Some notes:

The html repr looks pretty similar than the plain-text repr. I think it's better if they don't differ too much from each other.
For the sake of consistency, I've stolen some style from pandas.Dataframe repr as it is shown in jupyterlab.
I tried to emphasize the most important parts of the repr, i.e., the lists of dimensions, coordinates and variables.
I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript). It already allows some interaction like hover effects and collapsible sections. However, I doubt that more fancy stuff (like, e.g., highlighting on hover a specific dimension simultaneously at several places of the repr) would be possible here without Javascript. I have limited skills in this area, though.

It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not!

design question

Source

benbovy

👍7 🎉4

Most helpful comment

Thought I'd bump this (hopefully no one minds). I think that this is great!

mrocklin on 30 Jun 2019

👍5

All 39 comments

OMG this is so cool!

rabernat on 11 Oct 2017

OMG indeed this looks fantastic! We should test this on more examples, but I already love it. The underline is a great way to illustrate which variables are dimensions.

I think it's best if we keep a very lightweight implementation, i.e., pure HTML/CSS (no Javascript)

Agreed. Notebook viewers tend to sanitize out JavaScript, especially if originating from an unknown source.

highlighting on hover a specific dimension simultaneously at several places of the repr

This would also be awesome, if possible with pure CSS. I don't know, but I'm going to bug my colleague who does JavaScript visualization.

shoyer on 12 Oct 2017

OK, so bad news is that it not possible to select elements other than siblings or descendant tags with CSS. So selecting "cousin" tags like a dimension name at multiple locations in the repr is out.

But we could potentially add JavaScript for fancy hover effects. Even if it gets stripped out in many cases (for untrusted notebooks), it should degrade gracefully to the HTML only repr. My main concern would annoying prompts in the notebook interface that ask a user if they want to trust outputs or not.

shoyer on 12 Oct 2017

Wow, great job @benbovy!

With the upcoming move towards Jupyter Lab and a better infrastructure for custom plugins, could this serve as the basis for a "NetCDF Extension" for Jupyter Lab? It would be great if double clicking on a NetCDF file in the JLab file explorer could open up this sort of information, or even a quick and dirty ncview-like plotter.

darothen on 12 Oct 2017

👍3

Thanks for the feedback!

Here are a few more ideas: https://jsfiddle.net/9ab4c3tr/48/

main section titles are colored so that these are more detached from the content (not sure I really like it, though),
subtle shade variations, notably for displaying the first values for each variable,
drop-downs for displaying attributes per variable if any (collapsed by default).
hover-effect for dimensions: highlight all variables having the hovered dimension (uses Javascript)

We might use drop-downs to display other useful information as well, such as the type of array (e.g., dask-array, in-memory numpy.array, etc...)

Fancy hover effects would be awesome indeed, although my concern is that too much hover effect would be a source of distraction.

A jupyterlab NetCDF viewer extension would be awesome too! It might also leverage phosphor's datagrid (https://github.com/phosphorjs/phosphor/issues/285) to explore the raw data values.

benbovy on 12 Oct 2017

Although this could clearly be made more fancy and complicated, I think
what you have here would already be a great addition. I love the drop-down
attributes (those are missing from the standard repr).

Don't hesitate to start a PR! More fanciness could always be added in the
future based on user feedback.

On Thu, Oct 12, 2017 at 9:01 AM, Benoit Bovy notifications@github.com
wrote:

Thanks for the feedback!

Here is a few more ideas: https://jsfiddle.net/9ab4c3tr/48/

main sections titles are colored so that these are more detached
from the content (not sure I really like it, though),

subtle shade variations, notably for displaying the first values for
each variable,

drop-downs for displaying attributes per variable if any (collapsed
by default).

hover-effect for dimensions: highlight all variables having the
hovered dimension (uses Javascript)

We might use drop-downs to display other useful information as well, such
as the type of array (e.g., dask-array, in-memory numpy.array, etc...)

Fancy hover effects would be awesome indeed, although my concern is that
too much hover effect would be a source of distraction.

A jupyterlab NetCDF viewer extension would be awesome too! It might also
leverage phosphor's datagrid (phosphorjs/phosphor#285
https://github.com/phosphorjs/phosphor/issues/285) to explore the raw
data values.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/pydata/xarray/issues/1627#issuecomment-336127376, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABJFJrhyjDbEVEz5AqUN6l2ppExwE3xiks5srg2MgaJpZM4P2LOr
.

rabernat on 12 Oct 2017

👍1

As long as we're nit-picking style....

main section titles are colored so that these are more detached from the content (not sure I really like it, though),

I slightly prefer the grayscale you had before -- the section titles are already well detached.

subtle shade variations, notably for displaying the first values for each variable,

Love it!

drop-downs for displaying attributes per variable if any (collapsed by default).

It is wonderful to expose this information! I have a slight concern that the + icons could be confusing with the * we use for index variables in the normal repr. I don't have any good ideas for fixing this yet.

hover-effect for dimensions: highlight all variables having the hovered dimension (uses Javascript)

For highlighting, maybe something a little more subtle would work well, either:

only highlighting the dimension names below, not the full variables, or
only highlight full variables with a matching name, as well as all matching dimensions.

shoyer on 12 Oct 2017

Before starting a PR on this, I'd like to get the design right (at least the static part) and also to clarify the repr of other xarray objects such as DataArray, Variable, Dataset.coords and Dataset.data_vars.

For Dataset.coords and Dataset.data_vars it would be pretty straightforward (we may simply display the corresponding sections in Dataset html repr).

For DataArray and Variable I have still no clear idea on what would be a good representation for the data values:

Option A: embed the plain-text representation of the wrapped numpy.array (or dask.array) in a HTML container (https://jsfiddle.net/43z4v2wt/1/). The result is not that bad, although monospace and sans-serif fonts do not mix very well IMO.
Option B: a html-formatted table for data values as well. That would be tricky for >2d arrays, it has already been discussed in other issues without real consensus.

benbovy on 12 Oct 2017

have a slight concern that the + icons could be confusing with the * we use for index variables in the normal repr.

Maybe putting the + icon more to the left would do the trick (https://jsfiddle.net/9ab4c3tr/49/).

For highlighting, maybe something a little more subtle would work well

Changing font-weight instead of background-color works better IMO (same link above).

benbovy on 12 Oct 2017

Option A: embed the plain-text representation of the wrapped numpy.array (or dask.array) in a HTML container (https://jsfiddle.net/43z4v2wt/1/). The result is not that bad, although monospace and sans-serif fonts do not mix very well IMO.

I think this is probably our best option. My main suggestion is it should be possible to click somewhere (maybe on top xarray.DataArray header) to hide/show the data section.

shoyer on 12 Oct 2017

Maybe putting the + icon more to the left would do the trick (https://jsfiddle.net/9ab4c3tr/49/).

Now it just looks a little out of place :). I liked it better closer to the variable name.

Changing font-weight instead of background-color works better IMO (same link above).

Hmm, I'm not sure. I find text that moves very distracting -- I like changing background colors.

shoyer on 12 Oct 2017

My main suggestion is it should be possible to click somewhere (maybe on top xarray.DataArray header) to hide/show the data section.

See https://jsfiddle.net/43z4v2wt/3/

I find text that moves very distracting

Indeed, I agree now, it's annoying.

benbovy on 12 Oct 2017

One more trick we could add for the Dataset repr:

When hovering over data, show repr for the full array values (as a numpy array) in a box (similar to "title" text, but ideally fixed-width format).

shoyer on 12 Oct 2017

When hovering over data, show repr for the full array values (as a numpy array) in a box (similar to "title" text, but ideally fixed-width format).

That would be nice, although I guess that it would require javascript.

Alternatively, we can imagine two drop-downs per variable, one for the attributes and one for the full array values (numpy or dask repr). Each would be shown/hidden by two distinct symbols or links, perhaps both located on the right after the preview-values so that we avoid confusion with symbol * in the normal repr.

In the DataArray repr, instead of completely collapsing the data section we may reduce it into a one-line preview of the first values. We can even imagine showing the reduced version by default in cases where the full data section would take too much vertical space.

benbovy on 12 Oct 2017

A comment regarding the use of javascript: after reading this discussion (https://github.com/bokeh/bokeh/issues/6700) I'm not sure that maintaining a javascript-based repr that works with both jupyterlab and the classic notebook is worth the few fancy features it would provide.

I'd rather stick with a pure html/css solution here. It might still co-exist with a full-featured jupyterlab extension for viewing NetCDF files.

benbovy on 12 Oct 2017

👍2

Alternatively, we can imagine two drop-downs per variable, one for the attributes and one for the full array values (numpy or dask repr). Each would be shown/hidden by two distinct symbols or links, perhaps both located on the right after the preview-values so that we avoid confusion with symbol * in the normal repr.

Yes, I like that. Possibly clicking on the array values would show the larger preview.

I did play around a little with a hover drop down, but the positioning is a little hacky:
https://jsfiddle.net/gux879hn/2/

shoyer on 12 Oct 2017

Note that the info() method was added a while ago: https://github.com/pydata/xarray/pull/1176

At least in the notebook, this change would make it obsolete.

fmaussion on 13 Oct 2017

New version for Dataset: https://jsfiddle.net/9ab4c3tr/50/

clicking on the name of a variable with A symbol shows its attributes.
clicking on the first-values of a variable shows the .data repr

benbovy on 13 Oct 2017

And for DataArray: https://jsfiddle.net/43z4v2wt/5/ (possibility to reduce the full data repr to a one-line preview)

benbovy on 13 Oct 2017

This is wonderful!

Is it reasonable to add a drop down in the attribute section if there are too many items there?

My data sometimes have a long list of attributes and important information about coordinate and data_vars are not seen without scrolling back the page.
(This may be a rare case though...)

fujiisoup on 13 Oct 2017

Is it reasonable to add a drop down in the attribute section if there are too many items there?

In the examples above, you can click on the Attributes section title to collapse the whole section (same with coordinates and data variables sections).

Given your case, maybe it would be nice to add a rule to show the attribute section collapsed when the number of attributes is too long. If we allow that, then maybe it would be nice to also show the number of items in the section titles, e.g., ► Attributes (10):

benbovy on 13 Oct 2017

👍1

I have a slight proposed tweak on the Dataset repr (https://jsfiddle.net/jrot9pex/1/). It makes two changes:

Changes the letter "A" to "a" for attributes. Maybe this is just me, but the capital "A" feels very loud, and reminds me of the scarlet letter!
Moves the marker "a" closer to the variable name, which makes it slightly clearer that it's associated.

Note that the info() method was added a while ago: #1176

At least in the notebook, this change would make it obsolete.

Yes, but let's keep it -- it have a nice pure text format (from ncdump) which works especially well with copy & paste.

Given your case, maybe it would be nice to add a rule to show the attribute section collapsed when the number of attributes is too long. If we allow that, then maybe it would be nice to also show the number of items in the section titles, e.g., ► Attributes (10):

Yes, I like this general idea -- though we might only show the number when it is collapsed, e.g., ► Attributes: (10)

shoyer on 13 Oct 2017

If we allow that, then maybe it would be nice to also show the number of items in the section titles, e.g., ► Attributes (10):

I like this idea. Looking forward to having it.

fujiisoup on 13 Oct 2017

though we might only show the number when it is collapsed

Good idea!

Changes the letter "A" to "a" for attributes

Agreed.

Moves the marker "a" closer to the variable name

The good (or bad?) thing with the marker "a" to the very left is that it is aligned with the drop-down symbols of the main sections, but to me either way is fine.

Let's summarize all suggestions so far:

Dataset: https://jsfiddle.net/tay08cn9/2/
DataArray: https://jsfiddle.net/43z4v2wt/7/

I think that we're getting close to something good!

I'll wait a bit before starting a PR (maybe sometime next week), in case other feedbacks or suggestions come up.

benbovy on 13 Oct 2017

OMG this is so cool!

ditto wow I can't wait for this to be in!

spencerahill on 13 Oct 2017

@benbovy it seems like that discussion has stalled out... are you ready to put together that PR? :)

Personally, I'm very happy with your current version. You might even convince me to hold off on the v0.10 release to include it!

shoyer on 29 Oct 2017

👍2

Sorry for my late reply @shoyer !

Yes I'm happy with this version too, I'll open asap a PR! Unfortunately I hadn't much time to work on this these last two weeks, but I want to put this in my priorities next week.

I guess that it is a bit late for v0.10 release which is already on track? I still don't exactly know how much time I'll need to implement this. I haven't thought yet about all implementation details (e.g., how to calculate the width of the variable name column? potential issues with jupyter notebook / jupyterlab...). It might also require some refactoring and/or new public API (e.g., to_html like in pandas).

benbovy on 1 Nov 2017

I guess that it is a bit late for v0.10 release which is already on track?

Yes, probably at this point. But hopefully we can do a shorter release cycle for v0.11. Also, though this is a big visual change, I'm not sure it's actually a breaking change, per se. Only notebook output will change, not programmatic use of repr().

shoyer on 1 Nov 2017

Also, though this is a big visual change, I'm not sure it's actually a breaking change, per se.

Yes, but once it's there I'm sure that we'll get bug reports right away if it doesn't work well, because this is a very visual change. Therefore I am almost in favor of pushing this forward as fast as possible and see how it goes, maybe followed by a quick 0.10.1

But since I'm not the one doing the job I'm not allowed to say anything more here ;)

fmaussion on 1 Nov 2017

👍1

I started working on a new PR (not yet submitted), but before continuing the work I really need to know how best we can include the CSS code in rich outputs for the notebook (e.g., inline CSS or using the <style> tag) without requiring any extension for the various front-end applications.

It is not really clear to me if it is even supported so I opened an issue on the jupyterlab side (see reference above).

benbovy on 7 Nov 2017

👍1

So we have different options regarding the tools to use for implementing these rich representations:

Using vdom. This option is very pythonic and is suggested by jupyterlab and nteract developers. I have used vdom to implement the rich repr of Dataset (not fully working yet + you need last jupyterlab or nteract to see it): https://gist.github.com/benbovy/a30f286f7fdf9528c4d0c7980be9b6a7. vdom is still in development, though. It is not yet supported by all front-ends. Currently it is supported in jupyterlab and nteract and support will be added soon for the classic notebook (and nbviewer?)
Using a template system like jinja2. It doesn't require any specific support on the various notebook front-ends, but it still adds a dependency.
Using Python strings formatting. No dependencies, but more tough to maintain.

I'd like to know what are your thoughts. Should we go with vdom and wait a bit until the tool is more mature / more front-ends are supported (hopefully soon, which let us some time if we can wait for release v0.11)? Or do you want it right now and so it's perhaps better to use templates / formatting?

benbovy on 13 Nov 2017

I am OK adding new (optional) Python dependencies like vdom or Jinja2 if that makes implementing and maintaining this easier.

If we can solve the problem of generating HTML from vdom as a fallback (https://github.com/nteract/vdom/issues/43) and the vdom developers are supportive for various issues that come up, that could easily be the best option.

I would be reluctant to only support vdom output, because there are a lot of legacy notebook viewing interfaces (including various IDEs, cloud hosted notebook environments and rendering on GitHub) that could take a while (years?) to support it.

shoyer on 13 Nov 2017

The other thing to watch out for is if the vdom Python API is still immature and likely to lead to additional works when it changes in the future. This is somewhat of a judgment call. My sense is that they are taking a careful design to the project but it is still in the early days so it's hard to say for sure.

shoyer on 13 Nov 2017

👍1

Yes good points.

One part of me (the one that loves every cool, new package) says "let's use vdom for this right now", and the other (more wise) part says "let's use Python string formatting -- not even sure we need something like jinja2 --, we'll switch to vdom once it is more mature and it will be quite easy to do so."

benbovy on 13 Nov 2017

😄2

@benbovy - this came up in conversation today with @shoyer and a number of Jupyter devs. What is the current interest on the subject?

jhamman on 13 Jul 2018

Yeah I really need to continue the work in #1820, this PR has stalled for too long!

Last time I worked on this I was struggling a bit on good column auto-sizing and alignment with a pure CSS implementation (i.e., using CSS grid, display: content), but that's not really a blocker I think.

Another option would be to write a jupyterlab mime render extension. This narrows the supported front-ends, but I guess the issue of front-end theme integration will be easier to solve. Also, it will be possible to add more fancy features later.

Did you were at SciPy? I hope you had a great time! I wanted to attend the conference this year but finally I couldn't make it, unfortunately.

benbovy on 14 Jul 2018

Last time I worked on this I was struggling a bit on good column auto-sizing and alignment with a pure CSS implementation...but that's not really a blocker I think.

Agreed. I think that is something we can work on over time. A jupyterlab extension would be cool too but, as you say, it would have a smaller footprint in the shor-term.

Also, yeah, Scipy was great. I think I'll go back.

jhamman on 16 Jul 2018

Thought I'd bump this (hopefully no one minds). I think that this is great!

mrocklin on 30 Jun 2019

👍5

🎉 🍰 🍾 🏆 🏅

rabernat on 24 Oct 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings