Make sure these boxes are checked before submitting your issue - thank you!
0.27.0
Retrieving a defined registered lookup seems to not working.
After definiting and loading a bulk lookup as defined below:
{
"__default": {
"mylookup": {
"version": "v0",
"lookupExtractorFactory": {
"type": "cachedNamespace",
"extractionNamespace": {
"type": "jdbc",
"connectorConfig": {
"createTables": false,
"connectURI": "jdbc:postgresql:\/\/localhost:5432\/mydb",
"user": "myuser",
"password": "mypsw"
},
"table": "mytable",
"keyColumn": "id",
"valueColumn": "name"
},
"firstCacheTimeout": 120000,
"injective": true
}
}
}
}
I tried to generate an extra druid metric for this data source to extract the lookup value. Before using the "type" : "extraction" in the DimensionSpec, I used the following configuration:
{
"type" : "lookup",
"dimension" : "unit_id",
"outputName": "unit_name",
"retainMissingValue" : false,
"replaceMissingValueWith" : "missing",
"name" : "mylookup"
}
And it partially worked. It let me run the following query and returning correctly the unit name as per defined above:

But the filter menu doesn't get populated and also if I unselect the "Include time" option, the query fails with the following traceback:
Traceback (most recent call last):
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3064, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'unit_name'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 394, in get_df_payload
df = self.get_df(query_obj)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 190, in get_df
self.results = self.datasource.query(query_obj)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1337, in query
df = self.homogenize_types(df, query_obj.get('groupby', []))
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1324, in homogenize_types
df[col] = df[col].fillna('<NULL>').astype('unicode')
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
values = self._data.get(item)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 3066, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'unit_name'
Therefore I went on and defined an extraction function as suggested in #4740 in this, here below:
{
"type": "extraction",
"dimension": "unit_id",
"outputName": "unit_name",
"outputType": "STRING",
"extractionFn": {
"type": "lookup",
"dimension" : "unit_id",
"outputName" : "unit_name",
"name": "mylookup"
}
}
but I get the following error:
Traceback (most recent call last):
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pydruid/client.py", line 488, in _post
res = urllib.request.urlopen(req)
File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/home/myuser/miniconda3/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 394, in get_df_payload
df = self.get_df(query_obj)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 190, in get_df
self.results = self.datasource.query(query_obj)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1331, in query
client=client, query_obj=query_obj, phase=2)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 930, in get_query_str
return self.run_query(client=client, phase=phase, **query_obj)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1203, in run_query
client.topn(**pre_qry)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pydruid/client.py", line 123, in topn
return self._post(query)
File "/home/myuser/miniconda3/lib/python3.5/site-packages/pydruid/client.py", line 508, in _post
separators=(',', ': '))))
OSError: HTTP Error 500: Internal Server Error
Druid Error: Internal Server Error
Query is: {
"aggregations": [
{
"name": "count",
"type": "count"
}
],
"dataSource": "TestingDruidLookups",
"dimension": {
"dimension": "unit_id",
"extractionFn": {
"dimension": "unit_id",
"name": "mylookup",
"outputName": "unit_name",
"type": "lookup"
},
"outputName": "unit_name",
"outputType": "STRING",
"type": "extraction"
},
"granularity": "all",
"intervals": "1901-01-01T00:00:00+00:00/2018-09-26T14:13:14+00:00",
"metric": "count",
"postAggregations": [],
"queryType": "topN",
"threshold": 10000
}
I also tried to implemented a registered lookup as suggested in here in section "Registered lookup extraction function", defined as per below:
{
"type": "extraction",
"dimension": "unit_id",
"outputName": "unit_name",
"outputType": "STRING",
"extractionFn": {
"type": "registeredLookup",
"retainMissingValue": "mylookup"
}
}
but I get the following traceback:
Traceback (most recent call last):
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 394, in get_df_payload
df = self.get_df(query_obj)
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/viz.py", line 190, in get_df
self.results = self.datasource.query(query_obj)
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1331, in query
client=client, query_obj=query_obj, phase=2)
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 930, in get_query_str
return self.run_query(client=client, phase=phase, **query_obj)
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1218, in run_query
filters)
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 943, in _add_filter_from_pre_query_data
(col, extraction_fn) = DruidDatasource._create_extraction_fn(dim)
File "/home/serg/miniconda3/lib/python3.5/site-packages/superset/connectors/druid/models.py", line 1396, in _create_extraction_fn
raise Exception(_('Unsupported extraction function: ' + ext_type))
Exception: Unsupported extraction function: registeredLookup
In CONCLUSION, how I can get this bulk updated lookup issued through the druid coordinator to work (groupby and filter operations) in superset?
(Not sure this could be of help but I though of mention it in case) Perhaps Pydruid package does not have support for registered lookups? (I had a look of this issue and this comment and from this source code seems there is not class for registered lookups but only class NamespaceLookupExtraction
@srggrs ,
did you have any luck getting to the bottom of this?
guys see PR #7030 or #7031
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.
This is an important issue for me as well and it seems that it is not fixed at least until 29.04. Is it fixed in later versions?
Most helpful comment
This is an important issue for me as well and it seems that it is not fixed at least until 29.04. Is it fixed in later versions?