Presto: Presto is unable to map nested objects in ElasticSearch

Created on 12 May 2019  路  12Comments  路  Source: prestosql/presto

Current Elasticsearch connector throws UnsupportedOperationException for nested object type
Example nested field in index:

 "departments": {
      "type": "nested",
      "properties": {
        "dept": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }

Failure:

java.lang.UnsupportedOperationException
   at com.facebook.presto.spi.block.SingleRowBlockWriter.build(SingleRowBlockWriter.java:207)
   at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializePrimitive(ElasticsearchUtils.java:108)
   at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializeObject(ElasticsearchUtils.java:49)
   at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializeStruct(ElasticsearchUtils.java:74)
   at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializeObject(ElasticsearchUtils.java:44)
   at com.facebook.presto.elasticsearch.ElasticsearchRecordCursor.getObject(ElasticsearchRecordCursor.java:165)
   at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:121)
   at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:242)
   at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
   at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
   at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
   at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
   at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
   at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
   at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
   at com.facebook.presto.$gen.Presto_0_218____20190512_112916_1.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748
bug

All 12 comments

@zhenxiao, are you working on this already?

yes, am working on it

@zhenxiao any progress here? This is something i'm also looking to solve so if you aren't currently working on this I can take a look.

@brianolsen87, I don't think @zhenxiao is currently working on this, so feel free to grab it.

So I had one idea I wanted to test out. Once we resolve #2441 and arrays are supported, couldn't we just treat nested objects the same as regular objects if we avoided doing pushdown queries on nested fields? It may be somewhat of a performance tradeoff but would be considerably less code to manage.

@brianolsen87, indeed. Nested objects should appear as regular objects (mapped to ROW types in Presto).

As a separate improvement, we can take advantage of support for dereference pushdown and filter pushdown in the engine to prune out fields that are not needed and leverage filters over nested fields. There's some related work going on for Hive: https://github.com/prestosql/presto/pull/1720

Okay, so for the first step we can implement nested type the same as object type to simply map to a ROW and just make sure no pushdown occurs or that no pushdown flags are set (haven't gotten that far) and create a separate issue to add those in at a later time.

Created The second half of this issue in #2519

could this commit resolve nested types:
prestodb/presto@4222738

I missed this @zhenxiao . I looked at that commit and it seems some of the modified files no longer exist. It seems like this was merged at some point PR1001.

Was something changed in the connector to where this fix inadvertently got removed?

It seems like this was merged at some point PR1001.

Yes, but we completely re-worked how the connector operates. You used to have to declare the mappings between Presto tables/columns and Elasticsearch indexes/fields manually. Now they are derived automatically and dynamically. See #1639 and #1588

Was this page helpful?
0 / 5 - 0 ratings