Incubator-superset: Support for Registered or Global (not map or inline) Lookup Extraction DimensionSpec

Created on 26 Sep 2018  路  4Comments  路  Source: apache/incubator-superset

Make sure these boxes are checked before submitting your issue - thank you!

  • [ x ] I have checked the superset logs for python stacktraces and included it here as text if any
  • [ x ] I have reproduced the issue with at least the latest released version of superset
  • [ x ] I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.27.0

Description

Retrieving a defined registered lookup seems to not working.
After definiting and loading a bulk lookup as defined below:

{
  "__default": {
    "mylookup": {
      "version": "v0",
      "lookupExtractorFactory": {
        "type": "cachedNamespace",
        "extractionNamespace": {
          "type": "jdbc",
          "connectorConfig": {
            "createTables": false,
            "connectURI": "jdbc:postgresql:\/\/localhost:5432\/mydb",
            "user": "myuser",
            "password": "mypsw"
          },
          "table": "mytable",
          "keyColumn": "id",
          "valueColumn": "name"
        },
        "firstCacheTimeout": 120000,
        "injective": true
      }
    }
  }
}

I tried to generate an extra druid metric for this data source to extract the lookup value. Before using the "type" : "extraction" in the DimensionSpec, I used the following configuration:

{
  "type" : "lookup",
  "dimension" : "unit_id",
  "outputName": "unit_name",
  "retainMissingValue" : false,
  "replaceMissingValueWith" : "missing",
  "name" : "mylookup"
}

And it partially worked. It let me run the following query and returning correctly the unit name as per defined above:
image
But the filter menu doesn't get populated and also if I unselect the "Include time" option, the query fails with the following traceback:

Traceback (most recent call last):
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3064, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'unit_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 394, in get_df_payload
    df = self.get_df(query_obj)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 190, in get_df
    self.results = self.datasource.query(query_obj)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1337, in query
    df = self.homogenize_types(df, query_obj.get('groupby', []))
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1324, in homogenize_types
    df[col] = df[col].fillna('<NULL>').astype('unicode')
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
    values = self._data.get(item)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3066, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'unit_name'

Therefore I went on and defined an extraction function as suggested in #4740 in this, here below:

{
  "type": "extraction",
  "dimension": "unit_id",
  "outputName": "unit_name",
  "outputType": "STRING",
  "extractionFn": {
    "type": "lookup",
    "dimension" : "unit_id",
    "outputName" : "unit_name",
    "name": "mylookup"
  }
}

but I get the following error:

Traceback (most recent call last):
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pydruid/client.py", line 488, in _post
    res = urllib.request.urlopen(req)
  File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 163, in urlopen
    return opener.open(url, data, timeout)
  File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 472, in open
    response = meth(req, response)
  File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 582, in http_response
    'http', request, response, code, msg, hdrs)
  File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 510, in error
    return self._call_chain(*args)
  File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 444, in _call_chain
    result = func(*args)
  File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 590, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 394, in get_df_payload
    df = self.get_df(query_obj)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 190, in get_df
    self.results = self.datasource.query(query_obj)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1331, in query
    client=client, query_obj=query_obj, phase=2)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 930, in get_query_str
    return self.run_query(client=client, phase=phase, **query_obj)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1203, in run_query
    client.topn(**pre_qry)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pydruid/client.py", line 123, in topn
    return self._post(query)
  File "/home/myuser/miniconda3/lib/python3.5/site-packages/pydruid/client.py", line 508, in _post
    separators=(',', ': '))))
OSError: HTTP Error 500: Internal Server Error 
 Druid Error: Internal Server Error 
 Query is: {
    "aggregations": [
        {
            "name": "count",
            "type": "count"
        }
    ],
    "dataSource": "TestingDruidLookups",
    "dimension": {
        "dimension": "unit_id",
        "extractionFn": {
            "dimension": "unit_id",
            "name": "mylookup",
            "outputName": "unit_name",
            "type": "lookup"
        },
        "outputName": "unit_name",
        "outputType": "STRING",
        "type": "extraction"
    },
    "granularity": "all",
    "intervals": "1901-01-01T00:00:00+00:00/2018-09-26T14:13:14+00:00",
    "metric": "count",
    "postAggregations": [],
    "queryType": "topN",
    "threshold": 10000
}

I also tried to implemented a registered lookup as suggested in here in section "Registered lookup extraction function", defined as per below:

{
  "type": "extraction",
  "dimension": "unit_id",
  "outputName": "unit_name",
  "outputType": "STRING",
  "extractionFn": {
    "type": "registeredLookup",
    "retainMissingValue": "mylookup"
  }
}

but I get the following traceback:

Traceback (most recent call last):
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 394, in get_df_payload
    df = self.get_df(query_obj)
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 190, in get_df
    self.results = self.datasource.query(query_obj)
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1331, in query
    client=client, query_obj=query_obj, phase=2)
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 930, in get_query_str
    return self.run_query(client=client, phase=phase, **query_obj)
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1218, in run_query
    filters)
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 943, in _add_filter_from_pre_query_data
    (col, extraction_fn) = DruidDatasource._create_extraction_fn(dim)
  File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1396, in _create_extraction_fn
    raise Exception(_('Unsupported extraction function: ' + ext_type))
Exception: Unsupported extraction function: registeredLookup

In CONCLUSION, how I can get this bulk updated lookup issued through the druid coordinator to work (groupby and filter operations) in superset?
(Not sure this could be of help but I though of mention it in case) Perhaps Pydruid package does not have support for registered lookups? (I had a look of this issue and this comment and from this source code seems there is not class for registered lookups but only class NamespaceLookupExtraction

inactive

Most helpful comment

This is an important issue for me as well and it seems that it is not fixed at least until 29.04. Is it fixed in later versions?

All 4 comments

@srggrs ,

did you have any luck getting to the bottom of this?

guys see PR #7030 or #7031

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

This is an important issue for me as well and it seems that it is not fixed at least until 29.04. Is it fixed in later versions?

Was this page helpful?
0 / 5 - 0 ratings