Pandas: Table Schema bombs with MultiIndex

Created on 14 Apr 2017 · 13Comments · Source: pandas-dev/pandas

Using the new (ok, yet to be released) table schemaized with the generated MultiIndex as mentioned in #15379, I noticed that it creates a traceback (and falls back on HTML)

Code Sample

import pandas as pd
import numpy as np
pd.options.display.html.table_schema = True

midx = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c']])

df = pd.DataFrame(np.random.randn(5, len(midx)), columns=midx)

df

Problem description

Full Traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in _convert_can_do_setop(self, other)
   2513                 try:
-> 2514                     other = MultiIndex.from_tuples(other)
   2515                 except:

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
   1128         elif isinstance(tuples, list):
-> 1129             arrays = list(lib.to_object_array_tuples(tuples).T)
   1130         else:

TypeError: Argument 'rows' has incorrect type (expected list, got FrozenList)

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    880             method = get_real_method(obj, self.print_method)
    881             if method is not None:
--> 882                 method()
    883                 return True
    884 

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/core/generic.py in _ipython_display_(self)
    138         latex = self._repr_latex_() if hasattr(self, '_repr_latex_') else None
    139         html = self._repr_html_() if hasattr(self, '_repr_html_') else None
--> 140         table_schema = self._repr_table_schema_()
    141         # We need the inital newline since we aren't going through the
    142         # usual __repr__. See

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/core/generic.py in _repr_table_schema_(self)
    156         if config.get_option("display.html.table_schema"):
    157             data = self.head(config.get_option('display.max_rows'))
--> 158             payload = json.loads(data.to_json(orient='table'),
    159                                  object_pairs_hook=collections.OrderedDict)
    160             return payload

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
   1232                             force_ascii=force_ascii, date_unit=date_unit,
   1233                             default_handler=default_handler,
-> 1234                             lines=lines)
   1235 
   1236     def to_hdf(self, path_or_buf, key, **kwargs):

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/io/json/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
     44         obj, orient=orient, date_format=date_format,
     45         double_precision=double_precision, ensure_ascii=force_ascii,
---> 46         date_unit=date_unit, default_handler=default_handler).write()
     47 
     48     if lines:

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/io/json/json.py in __init__(self, obj, orient, date_format, double_precision, ensure_ascii, date_unit, default_handler)
    141         # TODO: Do this timedelta properly in objToJSON.c See GH #15137
    142         if ((obj.ndim == 1) and (obj.name in set(obj.index.names)) or
--> 143                 len(obj.columns & obj.index.names)):
    144             msg = "Overlapping names between the index and columns"
    145             raise ValueError(msg)

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/base.py in __and__(self, other)
   2046 
   2047     def __and__(self, other):
-> 2048         return self.intersection(other)
   2049 
   2050     def __or__(self, other):

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in intersection(self, other)
   2447         """
   2448         self._assert_can_do_setop(other)
-> 2449         other, result_names = self._convert_can_do_setop(other)
   2450 
   2451         if self.equals(other):

/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in _convert_can_do_setop(self, other)
   2514                     other = MultiIndex.from_tuples(other)
   2515                 except:
-> 2516                     raise TypeError(msg)
   2517         else:
   2518             result_names = self.names if self.names == other.names else None

TypeError: other must be a MultiIndex or a list of tuples

The key line in there is expected list, got FrozenList from arrays = list(lib.to_object_array_tuples(tuples).T)

Bug MultiIndex Output-Formatting

Source

rgbkrk

All 13 comments

cc @TomAugspurger

jreback on 14 Apr 2017

Right... I think I knew about this and forgot to handle it. I think the issue is that the spec wants the field names to be a string. We would probably represent this as a tuple.

So we can do this, it just means we won't be compliant. Thoughts? I guess we could have a strict mode that would raise in cases like this? But for use-cases like sending data to nteract, we don't really want to raise an exception, so we'd be laxer.

TomAugspurger on 14 Apr 2017

Ah right, I remember a little bit of chatter on this. I think it would be ok to only put out an HTML table in cases like these, I'd prefer compliance over niceties for multiindex. We can move specs forward over time.

/cc @pwalsh

rgbkrk on 14 Apr 2017

Oh, we also generate invalid JSON with a MultiIndex in the columns 😬 https://github.com/pandas-dev/pandas/issues/15273. Don't think I'll have time to get to that before the release.

If you think falling back in these cases is appropriate, I'll put that logic in the publish part.

Another option is to serialize all the level names down to a string like <level1>-<level2>... and add some additional fields with the information needed to deserialze it (number of levels, separator...)

TomAugspurger on 14 Apr 2017

There were indeed still a bunch of cases where the json schema generation errors, see the list in the PR: https://github.com/pandas-dev/pandas/pull/14904#issuecomment-278288199 (and multi-index columns is one of them). We should probably open an issue for those (or use this one as the general follow-up issue).

jorisvandenbossche on 14 Apr 2017

we could add a NotImplementedError for now?

jreback on 14 Apr 2017

@rgbkrk what's the ideal behavior here, as far as front-ends are concerned? Do you want us to just not hava a "application/vnd.dataresource+json" key (text/html should still be there)? Is there another channel we should publish information on? I'm looking through the jupyter client docs now to see if there are any recommendations.

TomAugspurger on 24 Apr 2017

Yeah, just don't have the application/vnd.dataresource+json if it's not supported and only provide text/html.

rgbkrk on 24 Apr 2017

As for another channel, are you asking about where to send errors to?

rgbkrk on 24 Apr 2017

As for another channel, are you asking about where to send errors to?

Yeah, my memory is a bit hazy, but I thought there were other channels to publish errors. It looks like I was misremembering though. Is there some way to notify you that vnd.dataresource+json version failed?

TomAugspurger on 24 Apr 2017

Beyond a traceback? There's probably a way to push out other errors. Emitting a warning is fine too.

What do you think @minrk and @carreau?

rgbkrk on 25 Apr 2017

Warning on stderr will be printed in red in classic notebook. (I believe it should be yellow because red is scary). An error will stop the execution while just printing on stderr is considered as just "text" and the notebook should keep on processing as planned. Does that answer your question ?