Using the new (ok, yet to be released) table schemaized with the generated MultiIndex as mentioned in #15379, I noticed that it creates a traceback (and falls back on HTML)
import pandas as pd
import numpy as np
pd.options.display.html.table_schema = True
midx = pd.MultiIndex.from_product([['A', 'B'], ['a', 'b', 'c']])
df = pd.DataFrame(np.random.randn(5, len(midx)), columns=midx)
df
Full Traceback
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in _convert_can_do_setop(self, other)
2513 try:
-> 2514 other = MultiIndex.from_tuples(other)
2515 except:
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
1128 elif isinstance(tuples, list):
-> 1129 arrays = list(lib.to_object_array_tuples(tuples).T)
1130 else:
TypeError: Argument 'rows' has incorrect type (expected list, got FrozenList)
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
880 method = get_real_method(obj, self.print_method)
881 if method is not None:
--> 882 method()
883 return True
884
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/core/generic.py in _ipython_display_(self)
138 latex = self._repr_latex_() if hasattr(self, '_repr_latex_') else None
139 html = self._repr_html_() if hasattr(self, '_repr_html_') else None
--> 140 table_schema = self._repr_table_schema_()
141 # We need the inital newline since we aren't going through the
142 # usual __repr__. See
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/core/generic.py in _repr_table_schema_(self)
156 if config.get_option("display.html.table_schema"):
157 data = self.head(config.get_option('display.max_rows'))
--> 158 payload = json.loads(data.to_json(orient='table'),
159 object_pairs_hook=collections.OrderedDict)
160 return payload
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/core/generic.py in to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
1232 force_ascii=force_ascii, date_unit=date_unit,
1233 default_handler=default_handler,
-> 1234 lines=lines)
1235
1236 def to_hdf(self, path_or_buf, key, **kwargs):
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/io/json/json.py in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines)
44 obj, orient=orient, date_format=date_format,
45 double_precision=double_precision, ensure_ascii=force_ascii,
---> 46 date_unit=date_unit, default_handler=default_handler).write()
47
48 if lines:
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/io/json/json.py in __init__(self, obj, orient, date_format, double_precision, ensure_ascii, date_unit, default_handler)
141 # TODO: Do this timedelta properly in objToJSON.c See GH #15137
142 if ((obj.ndim == 1) and (obj.name in set(obj.index.names)) or
--> 143 len(obj.columns & obj.index.names)):
144 msg = "Overlapping names between the index and columns"
145 raise ValueError(msg)
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/base.py in __and__(self, other)
2046
2047 def __and__(self, other):
-> 2048 return self.intersection(other)
2049
2050 def __or__(self, other):
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in intersection(self, other)
2447 """
2448 self._assert_can_do_setop(other)
-> 2449 other, result_names = self._convert_can_do_setop(other)
2450
2451 if self.equals(other):
/Users/kylek/code/src/github.com/pandas-dev/pandas/pandas/indexes/multi.py in _convert_can_do_setop(self, other)
2514 other = MultiIndex.from_tuples(other)
2515 except:
-> 2516 raise TypeError(msg)
2517 else:
2518 result_names = self.names if self.names == other.names else None
TypeError: other must be a MultiIndex or a list of tuples
The key line in there is expected list, got FrozenList from arrays = list(lib.to_object_array_tuples(tuples).T)
cc @TomAugspurger
Right... I think I knew about this and forgot to handle it. I think the issue is that the spec wants the field names to be a string. We would probably represent this as a tuple.
So we can do this, it just means we won't be compliant. Thoughts? I guess we could have a strict mode that would raise in cases like this? But for use-cases like sending data to nteract, we don't really want to raise an exception, so we'd be laxer.
Ah right, I remember a little bit of chatter on this. I think it would be ok to only put out an HTML table in cases like these, I'd prefer compliance over niceties for multiindex. We can move specs forward over time.
/cc @pwalsh
Oh, we also generate invalid JSON with a MultiIndex in the columns 馃槵 https://github.com/pandas-dev/pandas/issues/15273. Don't think I'll have time to get to that before the release.
If you think falling back in these cases is appropriate, I'll put that logic in the publish part.
Another option is to serialize all the level names down to a string like <level1>-<level2>... and add some additional fields with the information needed to deserialze it (number of levels, separator...)
There were indeed still a bunch of cases where the json schema generation errors, see the list in the PR: https://github.com/pandas-dev/pandas/pull/14904#issuecomment-278288199 (and multi-index columns is one of them). We should probably open an issue for those (or use this one as the general follow-up issue).
we could add a NotImplementedError for now?
@rgbkrk what's the ideal behavior here, as far as front-ends are concerned? Do you want us to just not hava a "application/vnd.dataresource+json" key (text/html should still be there)? Is there another channel we should publish information on? I'm looking through the jupyter client docs now to see if there are any recommendations.
Yeah, just don't have the application/vnd.dataresource+json if it's not supported and only provide text/html.
As for another channel, are you asking about where to send errors to?
As for another channel, are you asking about where to send errors to?
Yeah, my memory is a bit hazy, but I thought there were other channels to publish errors. It looks like I was misremembering though. Is there some way to notify you that vnd.dataresource+json version failed?
Beyond a traceback? There's probably a way to push out other errors. Emitting a warning is fine too.
What do you think @minrk and @carreau?
Warning on stderr will be printed in red in classic notebook. (I believe it should be yellow because red is scary). An error will stop the execution while just printing on stderr is considered as just "text" and the notebook should keep on processing as planned. Does that answer your question ?
Does that answer your question ?
I think so, thanks. I'll emit a warning when we fail to serialize the object, and publish just the text and html reprs.