Hey! I'm using SQL Lab with a postgres backend. I can't select from some tables, specifically when the column has a type of TIMESTAMP WITH TIME ZONE. An example of the contents of such a cell is 2014-11-24 11:30:02.140381+00
I have other tables with the column of type TIMESTAMP WITHOUT TIME ZONE, and they work fine. Ex: 2010-01-28T16:59:52.900963
0.15.1
Query should run and return results.
I get flashed a 1 second error message saying "Datatype not understood" which then disappears, and then get shown the loading-spinner.
Stack trace:
2017-01-09 17:09:52,437:INFO:root:Running query:
SELECT product_id,
base_product_id,
updated
FROM public.product_id_mapping LIMIT 100
2017-01-09 17:09:52,513:ERROR:root:data type not understood
Traceback (most recent call last):
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/superset/views.py", line 2500, in sql_json
data = sql_lab.get_sql_results(query_id, return_results=True)
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/celery/local.py", line 188, in __call__
return self._get_current_object()(*a, **kw)
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/celery/app/task.py", line 420, in __call__
return self.run(*args, **kwargs)
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/superset/sql_lab.py", line 171, in get_sql_results
payload['columns'] = cdf.columns_dict if cdf else []
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/superset/dataframe.py", line 54, in columns_dict
agg = agg_func(self.__df.dtypes[col], col)
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/superset/dataframe.py", line 110, in agg_func
if np.issubdtype(dtype, np.number):
File "/Users/user/src/superset/venv/lib/python2.7/site-packages/numpy/core/numerictypes.py", line 755, in issubdtype
return issubclass(dtype(arg1).type, arg2)
TypeError: data type not understood
Run a SELECT query against a postgresql db table that has a column of type TIMEZONE WITH TIME STAMP
This looks like a dup of #1900
Good call, sorry xrmx!
@ama-ableton np, could you please add some debug as requested in https://github.com/airbnb/superset/issues/1900#issuecomment-270713457 ?
Sure, does that help?
I have the same issue for a PostgreSQL db on superset==0.15.4 .
I also have this issue.
Fetched last master but the same.
Thanks
From ama-ableton:
Here's the arg1 and arg2 passed into the issubdtype function mentioned in the stack trace:
arg1: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]
arg2: <type 'numpy.number'>
We don't use Postgres here at Airbnb and we would love to have a contributor that irons out all of those Postgres specific bugs!
There's a module called db_engine_specs.py that is used to define all of the engine specific logic. We could have a new method in there alter_dataframe that would apply some casting on Pandas dataframes after receiving the data. The base class would do nothing, and the Postgres one would somehow change things like psycopg2.tz.FixedOffsetTimezone to a numpy compatible date.
Updating Pandas (from 0.18.1 to 0.19.2 in my case) resolved this issue. I would recommend giving that a shot.
My understanding is that Pandas itself is supposed to store results using a time zone unaware type (by converting to UTC). Perhaps a defect in this behavior was introduced and subsequently resolved?
@tfellison thanks for the update. Can anyone else please confirm it?
After poking at this a bit more, it appears the issue is not entirely resolved by updating Pandas. The behavior I've observed is as follows:
Prior to updating, an exception would be thrown on any selection from a table containing a timestamp with time zone column, regardless of whether or not that column was selected. After updating, it's possible to select from such a table, as long as the results themselves do not contain any timestamp with time zone type values. It does appear to be possible to select a timestamp with time zone column as long as it's you cast to a timestamp without time zone. For example...
SELECT (timevalue_column_name at time zone 'UTC')::timestamp without time zone AS "Timestamp"
FROM table_name;
Applying some function that returns a different type also seems to work fine - converting to date for example:
SELECT date(timevalue_column_name at time zone 'UTC') AS "Date"
FROM table_name;
In SQLLab the issue ist still present:
[2017-11-12 21:25:03,039: ERROR/ForkPoolWorker-18] zip argument #2 must support iteration
Traceback (most recent call last):
File "/opt/apps/superset/venv_superset_0.20.5/lib64/python3.4/site-packages/superset/sql_lab.py", line 94, in get_sql_results
ctask, query_id, return_results, store_results, user_name)
File "/opt/apps/superset/venv_superset_0.20.5/lib64/python3.4/site-packages/superset/sql_lab.py", line 241, in execute_sql
'data': cdf.data if cdf.data else [],
File "/opt/apps/superset/venv_superset_0.20.5/lib64/python3.4/site-packages/superset/dataframe.py", line 52, in data
return self.__df.to_dict(orient='records')
File "/opt/apps/superset/venv_superset_0.20.5/lib64/python3.4/site-packages/pandas/core/frame.py", line 898, in to_dict
for row in self.values]
File "/opt/apps/superset/venv_superset_0.20.5/lib64/python3.4/site-packages/pandas/core/frame.py", line 898, in <listcomp>
for row in self.values]
TypeError: zip argument #2 must support iteration
select current_timestamp
or equivalently
select now()
It is definitely caused by the combination of date and timezone, since the following selects work without any problems:
Time with timezone:
select current_time
Timestamp without timezone:
select localtimestamp
or
select now()::timestamp
In our configuration (Postgres, Celery backend for SQLLab, Redis Cache) this bug manages to paralyze SQLLab for quite some time, hindering subsequent queries to succeed.
This likely is a bug in Pandas. When using a Postgres field with time zone a DateTimeIndex is created with “freq=None” which is not iterable. If using naive date times this is not present.
^^^ this is not entirely correct
There's probably a way to convert the column in the dataframe on our side to work around the bug. I spent 10-15 minutes trying a few days ago and gave up.
This is the reported issue: https://github.com/pandas-dev/pandas/issues/18372
This is the, confirmed, fix:
import numpy as np
from pandas.core.common import _maybe_box_datetimelike
@property
def data(self):
# workaround pandas issue
# see: https://github.com/pandas-dev/pandas/issues/18372
return [dict((k, _maybe_box_datetimelike(v))
for k, v in zip(self.__df.columns, np.atleast_1d(row)))
for row in self.__df.values]]
Let me see if I can create a patch for this tomorrow.
aaaa eeeee lysiy muzhik v nakolkah
Most helpful comment
After poking at this a bit more, it appears the issue is not entirely resolved by updating Pandas. The behavior I've observed is as follows:
Prior to updating, an exception would be thrown on any selection from a table containing a
timestamp with time zonecolumn, regardless of whether or not that column was selected. After updating, it's possible to select from such a table, as long as the results themselves do not contain anytimestamp with time zonetype values. It does appear to be possible to select atimestamp with time zonecolumn as long as it's you cast to atimestamp without time zone. For example...Applying some function that returns a different type also seems to work fine - converting to date for example: