When using Hive connector to fetch data that has null value in partitioning key, query fails as follows
com.facebook.presto.spi.PrestoException: Invalid partition value '\N' for BIGINT partition key: ss_sold_date_sk
at com.facebook.presto.hive.HiveUtil.bigintPartitionKey(HiveUtil.java:710)
at com.facebook.presto.hive.HiveUtil.parsePartitionValue(HiveUtil.java:596)
at com.facebook.presto.hive.HivePageSourceProvider.shouldSkipPartition(HivePageSourceProvider.java:547)
at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:172)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:63)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:80)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:231)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:417)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:300)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:721)
at com.facebook.presto.operator.Driver.processFor(Driver.java:293)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:545)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
there's glitch in handling partition pruning logic when hive partitioning key contains a null value. I can try to fix it
cc: @kewang1024
there's glitch in handling partition pruning logic when hive partitioning key contains a null value. I can try to fix it
Thanks for finding the issue, yeah, it would be great if you can help work on it, you can add me as the reviewer when you have the fix, thanks!
Can you put the failed example query here and tables definition if they're not tpch tables
Bug is fixed. Recreate step added below:
DDL:
CREATE TABLE t3(c2 bigint, c1 bigint)
CREATE TABLE t4(c2 bigint, c1 bigint) with(partitioned_by=array['c1'])
INSERT INTO t3 VALUES(null, 2)", 1)
INSERT INTO t4 VALUES(null, null), (2,2)", 2)
Failing query:
select * from t3, t4 where t3.c1=t4.c2