Incubator-superset: Can't make a timeserie if date is TIMESTAMP WITHOUT TIME ZONE (postgres)

Created on 16 Sep 2018  路  11Comments  路  Source: apache/incubator-superset

Make sure these boxes are checked before submitting your issue - thank you!

  • [X] I have checked the superset logs for python stacktraces and included it here as text if any
  • [x] I have reproduced the issue with at least the latest released version of superset
  • [x] I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.28 RC6 from 2018-09-12

Expected results

A timeseris graph is displayed

Actual results

  • Unexepeted Error is shown in the UI
  • Inspecting the asnwer from the corresponding https request, it shows a stacktrace finishing with :
    File "/home/work/incubator-superset/superset/utils.py", line 366, in datetime_to_epoch return (dttm - epoch_with_tz).total_seconds() * 1000 File "pandas/_libs/tslibs/timestamps.pyx", line 320, in pandas._libs.tslibs.timestamps._Timestamp.__sub__ TypeError: Timestamp subtraction must have the same timezones or no timezones

Steps to reproduce

  • Create source from a table having a date column of type TIMESTAMP WITHOUT TIME ZONE (postgres 9.3)
  • Try creating a timeseries graph using it

Workaround

  • According to my colleagues (I could not test), the same database allowed making TS graphs with a version of superset he installed a month ago from an "image for easyer installation he found on github"
  • Adding a record having value my_date WITH TIME ZONE 'MST', typed TIMESTAMP and using it as the time for the graph worked

Most helpful comment

Here is how I fixed this when running a clone of the repo's master branch from 2018-11-01 and the Docker install (after few "adjustments" to make it work). I hope this will help others until it is permanently fixed...

For the "World's Bank Data" dashboard:

  • Select Sources --> Tables from the top-level menu.
  • Click on the "Edit" icon for the wb_health_population table.
  • Click on the "List Columns" tab.
  • Scroll down to the "year" column.
  • Click on the "Edit" icon for the "year" column.
  • In the "Expression" box, enter "year ::timestamptz" (without the double quotes).
  • Click on the "Save" button.
  • Access/Refresh the dashboard.

I had another issue with the "World's Bank Data" dashboard - some charts were erroring out with 'column "xxxx" does not exist'. For example, 'column "sp_pop_totl" does not exist'. To fix:

  • Select Sources --> Tables from the top-level menu.
  • Click on the "Edit" icon for the wb_health_population table.
  • Click on the List Metrics tab.
  • Find the metric for the offending column. For example, sum__SP_POP_TOTL and click on its "Edit" icon.
  • Surround the column name in the "SQL Expression" field with double quotes. For example, change it from SUM(SP_POP_TOTL) to SUM("SP_POP_TOTL")

HTH

All 11 comments

It seems switching from dbapi2 to dbapi1 in the current version has changed how the dttm.tzinfo object is formed for psycopg2. Previously it was of type UTC, now it is psycopg2.tz.FixedOffset, which doesn't seem to work compatible with pandas.

Beyond the psycopg2 problem, I'm not sure the current logic in db_engine_specs of adding AT TIME ZONE 'UTC' to all postgres time grains is working as intended. Based on the documentation
https://www.postgresql.org/docs/9.2/static/functions-datetime.html (9.9.3), this notation changes timestamps with timezone into timestamps without timezone and vice versa. Furthermore, AT TIME ZONE 'UTC' is not added to the None time grain, which explains why one or the other always works. Perhaps they should all be cast to ::timestamp or ::timestamptz to ensure uniform type.

I'm also getting this error with the demo data. I run a local system using the provided docker solution.

Superset version

0.28.* as of 2018-10-16

Workaround

It works with the 0.27 version.

@villebro - when I cast it to ::timestamptz it fixed the issue for me (0.28 docker, Redshift via postgresql+psycopg2).

Excuse me. could you detail where to modify it? in which file which line? or setting? thanks!

I am facing the same problem. Is there any workaround except downgrading Superset to 0.27.0?

Sorry, been bogged down with other stuff for a while, but will try to look at this in the coming days. I'm surprised that this is working in 0.27, thought this bug was introduced after 0.26 after dropping the pandas.read_sql calls.

@davidgreenshtein - Sometimes you can work around it by creating a new calculated column and coercing to ::timestamptz

It doesn't work if you just update the database expression though.

Here is how I fixed this when running a clone of the repo's master branch from 2018-11-01 and the Docker install (after few "adjustments" to make it work). I hope this will help others until it is permanently fixed...

For the "World's Bank Data" dashboard:

  • Select Sources --> Tables from the top-level menu.
  • Click on the "Edit" icon for the wb_health_population table.
  • Click on the "List Columns" tab.
  • Scroll down to the "year" column.
  • Click on the "Edit" icon for the "year" column.
  • In the "Expression" box, enter "year ::timestamptz" (without the double quotes).
  • Click on the "Save" button.
  • Access/Refresh the dashboard.

I had another issue with the "World's Bank Data" dashboard - some charts were erroring out with 'column "xxxx" does not exist'. For example, 'column "sp_pop_totl" does not exist'. To fix:

  • Select Sources --> Tables from the top-level menu.
  • Click on the "Edit" icon for the wb_health_population table.
  • Click on the List Metrics tab.
  • Find the metric for the offending column. For example, sum__SP_POP_TOTL and click on its "Edit" icon.
  • Surround the column name in the "SQL Expression" field with double quotes. For example, change it from SUM(SP_POP_TOTL) to SUM("SP_POP_TOTL")

HTH

It seems switching from dbapi2 to dbapi1 in the current version has changed how the dttm.tzinfo object is formed for psycopg2. Previously it was of type UTC, now it is psycopg2.tz.FixedOffset, which doesn't seem to work compatible with pandas.

@villebro Is this a reasonable fix? At least retains the old behaviour.

sed -i 's/dttm - epoch_with_tz/dttm.replace(tzinfo=pytz.utc) - epoch_with_tz/g' venv/lib/python3.6/site-packages/superset/utils.py

@rahulsekar I'm kind of leaning towards fixing this in the time_grain definitions in db_engine_specs, as those are not logical right now. I'll be working with a postgres database in the upcoming days, let me get back once I've had time to test a few alternatives.

Was this page helpful?
0 / 5 - 0 ratings