Error below
Query failed : Corrupted statistics for column "[tabid] BINARY" in Parquet file
hdfs//location: [min: 盲nderung_an_einer_bestellung_vornehmen, max: zahlung, num_nulls:]
@shanrd7 can you add more details to the issue? A stack trace and a description about the error scenario will be good.
com.facebook.presto.spi.PrestoException: Corrupted statistics for column "[userfilterdisplaynames, map, key] BINARY" in Parquet file "
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:216)
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:117)
at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:161)
at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:95)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:221)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
at com.facebook.presto.$gen.Presto_0_217____20190328_182003_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.facebook.presto.parquet.ParquetCorruptionException: Corrupted statistics for column "[userfilterdisplaynames, map, key] BINARY" in Parquet file "hdfs://
at com.facebook.presto.parquet.predicate.TupleDomainParquetPredicate.failWithCorruptionException(TupleDomainParquetPredicate.java:291)
at com.facebook.presto.parquet.predicate.TupleDomainParquetPredicate.getDomain(TupleDomainParquetPredicate.java:202)
at com.facebook.presto.parquet.predicate.TupleDomainParquetPredicate.matches(TupleDomainParquetPredicate.java:93)
at com.facebook.presto.parquet.predicate.PredicateUtils.predicateMatches(PredicateUtils.java:92)
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:183)
... 17 more
After the recent upgrade to presto 0.217 we are seeing some accented characters on the parquet data query select statement with conditions
Based on our release note, statistics corruption detection was added in 0.216.
@zhenxiao @Parth-Brahmbhatt Looks like you guys recently change the logic around statistics verification in Parquet. Can you help here?
@shanrd7 is it possible for you to share the file with us? You can set hive.parquet.fail-on-corrupted-statistics to false so bad statistics won't result in query failures. What process writes the file and what version of presto is being used by engine that produced the file?
Most helpful comment
@shanrd7 is it possible for you to share the file with us? You can set
hive.parquet.fail-on-corrupted-statisticsto false so bad statistics won't result in query failures. What process writes the file and what version of presto is being used by engine that produced the file?