Current Elasticsearch connector throws UnsupportedOperationException for nested object type
Example nested field in index:
"departments": {
"type": "nested",
"properties": {
"dept": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
Failure:
java.lang.UnsupportedOperationException
at com.facebook.presto.spi.block.SingleRowBlockWriter.build(SingleRowBlockWriter.java:207)
at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializePrimitive(ElasticsearchUtils.java:108)
at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializeObject(ElasticsearchUtils.java:49)
at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializeStruct(ElasticsearchUtils.java:74)
at com.facebook.presto.elasticsearch.ElasticsearchUtils.serializeObject(ElasticsearchUtils.java:44)
at com.facebook.presto.elasticsearch.ElasticsearchRecordCursor.getObject(ElasticsearchRecordCursor.java:165)
at com.facebook.presto.spi.RecordPageSource.getNextPage(RecordPageSource.java:121)
at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:242)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
at com.facebook.presto.$gen.Presto_0_218____20190512_112916_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748
@zhenxiao, are you working on this already?
yes, am working on it
This is related issue. https://github.com/prestodb/presto/issues/12642
@zhenxiao any progress here? This is something i'm also looking to solve so if you aren't currently working on this I can take a look.
@brianolsen87, I don't think @zhenxiao is currently working on this, so feel free to grab it.
could this commit resolve nested types:
https://github.com/prestodb/presto/commit/4222738e57bdf55b44c26f8b2f0c5c0ed1b7ca17
So I had one idea I wanted to test out. Once we resolve #2441 and arrays are supported, couldn't we just treat nested objects the same as regular objects if we avoided doing pushdown queries on nested fields? It may be somewhat of a performance tradeoff but would be considerably less code to manage.
@brianolsen87, indeed. Nested objects should appear as regular objects (mapped to ROW types in Presto).
As a separate improvement, we can take advantage of support for dereference pushdown and filter pushdown in the engine to prune out fields that are not needed and leverage filters over nested fields. There's some related work going on for Hive: https://github.com/prestosql/presto/pull/1720
Okay, so for the first step we can implement nested type the same as object type to simply map to a ROW and just make sure no pushdown occurs or that no pushdown flags are set (haven't gotten that far) and create a separate issue to add those in at a later time.
Created The second half of this issue in #2519
could this commit resolve nested types:
prestodb/presto@4222738
I missed this @zhenxiao . I looked at that commit and it seems some of the modified files no longer exist. It seems like this was merged at some point PR1001.
Was something changed in the connector to where this fix inadvertently got removed?
It seems like this was merged at some point PR1001.
Yes, but we completely re-worked how the connector operates. You used to have to declare the mappings between Presto tables/columns and Elasticsearch indexes/fields manually. Now they are derived automatically and dynamically. See #1639 and #1588