Thanks for a great project! With these versions:
I get the following error when trying to pass a dataframe with a period to altair: TypeError: data type not understood
The error disappears when the 'Sale month' column is not passed to altair.
sale['Sale month'] = pd.to_datetime(sale['Sale month'], format='%b-%y').dt.to_period('M')
sale['Sale month']
0 2011-07
1 2011-08
2 2011-09
3 2011-10
4 2011-11
...
15845 2013-11
15846 2013-12
15847 2014-01
15848 2014-02
15849 2014-03
Name: Sale month, Length: 1173, dtype: period[M]
alt.Chart(sale[['Value', 'Sale month']].head()).mark_bar().encode(
)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/Documents/…/virtualenv/lib/python3.8/site-packages/altair/vegalite/v4/api.py in to_dict(self, *args, **kwargs)
353 copy = self.copy(deep=False)
354 original_data = getattr(copy, 'data', Undefined)
--> 355 copy.data = _prepare_data(original_data, context)
356
357 if original_data is not Undefined:
~/Documents/…/virtualenv/lib/python3.8/site-packages/altair/vegalite/v4/api.py in _prepare_data(data, context)
82 # convert dataframes or objects with __geo_interface__ to dict
83 if isinstance(data, pd.DataFrame) or hasattr(data, '__geo_interface__'):
---> 84 data = pipe(data, data_transformers.get())
85
86 # convert string input to a URLData
~/Documents/…/virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in pipe(data, *funcs)
632 """
633 for func in funcs:
--> 634 data = func(data)
635 return data
636
~/Documents/…/virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
301 def __call__(self, *args, **kwargs):
302 try:
--> 303 return self._partial(*args, **kwargs)
304 except TypeError as exc:
305 if self._should_curry(args, kwargs, exc):
~/Documents/…/virtualenv/lib/python3.8/site-packages/altair/vegalite/data.py in default_data_transformer(data, max_rows)
11 @curry
12 def default_data_transformer(data, max_rows=5000):
---> 13 return pipe(data, limit_rows(max_rows=max_rows), to_values)
14
15
~/Documents/…/virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in pipe(data, *funcs)
632 """
633 for func in funcs:
--> 634 data = func(data)
635 return data
636
~/Documents/…/virtualenv/lib/python3.8/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
301 def __call__(self, *args, **kwargs):
302 try:
--> 303 return self._partial(*args, **kwargs)
304 except TypeError as exc:
305 if self._should_curry(args, kwargs, exc):
~/Documents/…/virtualenv/lib/python3.8/site-packages/altair/utils/data.py in to_values(data)
138 return {'values': data}
139 elif isinstance(data, pd.DataFrame):
--> 140 data = sanitize_dataframe(data)
141 return {'values': data.to_dict(orient='records')}
142 elif isinstance(data, dict):
~/Documents/…/virtualenv/lib/python3.8/site-packages/altair/utils/core.py in sanitize_dataframe(df)
221 # otherwise it will give an error on np.issubdtype(dtype, np.integer)
222 continue
--> 223 elif np.issubdtype(dtype, np.integer):
224 # convert integers to objects; np.int is not JSON serializable
225 df[col_name] = df[col_name].astype(object)
~/Documents/…/virtualenv/lib/python3.8/site-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2)
391 """
392 if not issubclass_(arg1, generic):
--> 393 arg1 = dtype(arg1).type
394 if not issubclass_(arg2, generic):
395 arg2_orig = arg2
TypeError: data type not understood
The sanitization code needs to be updated to recognize period types. That said, Vega-Lite has no data type representing time periods, so I think the best possible outcome would probably be a more informative error.
Hi Jake, any idea of when this issue will be resolved? Still getting the TypeError: data type not understood error when PeriodIndex is present in the dataframe. Thank you!
It will be resolved when someone fixes it. Are you interested?
The relevant code that would need to be modified is here: https://github.com/altair-viz/altair/blob/03f37ca8b2701bf8d9e61646cacf8eee2edcad87/altair/utils/core.py#L243-L350
That said, it's unclear what is the best way to represent time periods in Vega-Lite, which doesn't have the concept of time periods. I suppose it could be represented as timestamps with respect to some standard start time, such as 2012-01-01 00:00:00
Quick question on this: how would you expect your period columns to be represented within the Vega-Lite chart? Should it be a timestamp? A string? Silently ignored?
Hi Jake, "I suppose it could be represented as timestamps with respect to some standard start time, such as 2012-01-01 00:00:00" - that would be my thought.
Could the sanitization code be used to convert periods into timestamps before they get passed to Vega-Lite?
And say if PeriodIndex has the frequency of "M" (Monthly), perhaps there is an implicit conversion to 'month(period_column)' in the Altair code? Not sure what's the best way to do this second part. Alternately, the easy way out may be for the user to explicitly specify 'month(period_column)', knowing that the period_column has been cast to time_stamp through sanitization.
Let me know your thoughts. Thank you.
We don't have any mechanism at the moment for data sanitization to affect the encoding specification, and I think we should probably avoid making those kinds of assumptions for the user. Silent changes of defaultsthat are convenient in some situations often lead to a lot of confusion in others.
Most helpful comment
It will be resolved when someone fixes it. Are you interested?