Incubator-superset: date_trunc function when connect to hive

Created on 10 Apr 2019  路  7Comments  路  Source: apache/incubator-superset

Make sure these boxes are checked before submitting your issue - thank you!

  • [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [x] I have reproduced the issue with at least the latest released version of superset.
  • [x] I have checked the issue tracker for the same issue and I haven't found one similar.

Superset version

0.28.1

Expected results

Actual results

errorCode=10011, errorMessage="Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 6:9 Invalid function 'date_trunc'"), operationHandle=None)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/superset/connectors/sqla/models.py", line 783, in query
    df = self.database.get_df(sql, self.schema)
  File "/usr/lib/python3.6/site-packages/superset/models/core.py", line 789, in get_df
    self.db_engine_spec.execute(cursor, sqls[-1])
  File "/usr/lib/python3.6/site-packages/superset/db_engine_specs.py", line 1268, in execute
    cursor.execute(query, **kwargs)
  File "/usr/lib/python3.6/site-packages/pyhive/hive.py", line 365, in execute
    _check_status(response)
  File "/usr/lib/python3.6/site-packages/pyhive/hive.py", line 495, in _check_status
    raise OperationalError(response)

Steps to reproduce

When I create time series bar chart, Superset uses 'date_trunc' function in hiveql group by clause and it raise exception.

inactive

Most helpful comment

Same here. I've checked the source code in incubator-superset/superset/db_engine_specs.py, from what I'm seeing, the reason this error happened is because they're using PrestoEngineSpec as HiveEngineSpec, as the code below:

class HiveEngineSpec(PrestoEngineSpec):

    """Reuses PrestoEngineSpec functionality."""

and hive doesn't support 'date_trunc' method.
So if u can build from the source code, u can just change the code into some thing like this:

class HiveEngineSpec(BaseEngineSpec):

    """Reuses PrestoEngineSpec functionality."""

    engine = 'hive'
    max_column_name_length = 767

    time_grain_functions = {
        None: '{col}',
        'PT1S': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd HH:mm:ss')",
        'PT1M': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd HH:mm:00')",
        'PT1H': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd HH:00:00')",
        'P1D': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd 00:00:00')",
        'P1W': "date_format(date_sub(nianxian, CAST(7-from_unixtime(unix_timestamp(nianxian),'u') as int)), 'yyyy-MM-dd 00:00:00')",
        'P1M': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-01 00:00:00')",
        'P0.25Y': "date_format(add_months(trunc(nianxian, 'MM'), -(month(nianxian)-1)%3), 'yyyy-MM-dd 00:00:00')",
        'P1Y': "from_unixtime(unix_timestamp({col}), 'yyyy-01-01 00:00:00')",
        'P1W/1970-01-03T00:00:00Z': "date_format(date_add(nianxian, INT(6-from_unixtime(unix_timestamp(nianxian), 'u'))), 'yyyy-MM-dd 00:00:00')",
        '1969-12-28T00:00:00Z/P1W': "date_format(date_add(nianxian, -INT(from_unixtime(unix_timestamp(nianxian), 'u'))), 'yyyy-MM-dd 00:00:00')"
    }

this should do the work.

All 7 comments

Same here. I've checked the source code in incubator-superset/superset/db_engine_specs.py, from what I'm seeing, the reason this error happened is because they're using PrestoEngineSpec as HiveEngineSpec, as the code below:

class HiveEngineSpec(PrestoEngineSpec):

    """Reuses PrestoEngineSpec functionality."""

and hive doesn't support 'date_trunc' method.
So if u can build from the source code, u can just change the code into some thing like this:

class HiveEngineSpec(BaseEngineSpec):

    """Reuses PrestoEngineSpec functionality."""

    engine = 'hive'
    max_column_name_length = 767

    time_grain_functions = {
        None: '{col}',
        'PT1S': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd HH:mm:ss')",
        'PT1M': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd HH:mm:00')",
        'PT1H': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd HH:00:00')",
        'P1D': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-dd 00:00:00')",
        'P1W': "date_format(date_sub(nianxian, CAST(7-from_unixtime(unix_timestamp(nianxian),'u') as int)), 'yyyy-MM-dd 00:00:00')",
        'P1M': "from_unixtime(unix_timestamp({col}), 'yyyy-MM-01 00:00:00')",
        'P0.25Y': "date_format(add_months(trunc(nianxian, 'MM'), -(month(nianxian)-1)%3), 'yyyy-MM-dd 00:00:00')",
        'P1Y': "from_unixtime(unix_timestamp({col}), 'yyyy-01-01 00:00:00')",
        'P1W/1970-01-03T00:00:00Z': "date_format(date_add(nianxian, INT(6-from_unixtime(unix_timestamp(nianxian), 'u'))), 'yyyy-MM-dd 00:00:00')",
        '1969-12-28T00:00:00Z/P1W': "date_format(date_add(nianxian, -INT(from_unixtime(unix_timestamp(nianxian), 'u'))), 'yyyy-MM-dd 00:00:00')"
    }

this should do the work.

@jeepxiaozi
thank for your reply :)

@jeepxiaozi
thank for your reply :)

sorry about the code part, I've edited a little bit so that u can see it clearly

@jeepxiaozi thank you for the fix! Could you create a pull request to the apache project please? :)

+1 for this one ;)

Remember you can override time grain functions in superset_config.py (see examples) without having to edit source and rebuild!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  路  3Comments

amien90 picture amien90  路  3Comments

XiaodiKong picture XiaodiKong  路  3Comments

vylc picture vylc  路  3Comments

tmccartan picture tmccartan  路  3Comments