Presto: Presto Upgraded to 0.217 Version, Corrupted statistics for column "[tabid] BINARY" in Parquet file

Created on 1 Apr 2019  路  5Comments  路  Source: prestodb/presto

Error below
Query failed : Corrupted statistics for column "[tabid] BINARY" in Parquet file
hdfs//location: [min: 盲nderung_an_einer_bestellung_vornehmen, max: zahlung, num_nulls:]

bug

Most helpful comment

@shanrd7 is it possible for you to share the file with us? You can set hive.parquet.fail-on-corrupted-statistics to false so bad statistics won't result in query failures. What process writes the file and what version of presto is being used by engine that produced the file?

All 5 comments

@shanrd7 can you add more details to the issue? A stack trace and a description about the error scenario will be good.

com.facebook.presto.spi.PrestoException: Corrupted statistics for column "[userfilterdisplaynames, map, key] BINARY" in Parquet file "": [min: 脛sthetik, max: Zur Verwendung im Innen- und Au脽enbereich, num_nulls: 277272]
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:216)
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createPageSource(ParquetPageSourceFactory.java:117)
at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:161)
at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:95)
at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:221)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1077)
at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
at com.facebook.presto.$gen.Presto_0_217____20190328_182003_1.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.facebook.presto.parquet.ParquetCorruptionException: Corrupted statistics for column "[userfilterdisplaynames, map, key] BINARY" in Parquet file "hdfs://": [min: 脛sthetik, max: Zur Verwendung im Innen- und Au脽enbereich, num_nulls: 277272]
at com.facebook.presto.parquet.predicate.TupleDomainParquetPredicate.failWithCorruptionException(TupleDomainParquetPredicate.java:291)
at com.facebook.presto.parquet.predicate.TupleDomainParquetPredicate.getDomain(TupleDomainParquetPredicate.java:202)
at com.facebook.presto.parquet.predicate.TupleDomainParquetPredicate.matches(TupleDomainParquetPredicate.java:93)
at com.facebook.presto.parquet.predicate.PredicateUtils.predicateMatches(PredicateUtils.java:92)
at com.facebook.presto.hive.parquet.ParquetPageSourceFactory.createParquetPageSource(ParquetPageSourceFactory.java:183)
... 17 more

After the recent upgrade to presto 0.217 we are seeing some accented characters on the parquet data query select statement with conditions

Based on our release note, statistics corruption detection was added in 0.216.

@zhenxiao @Parth-Brahmbhatt Looks like you guys recently change the logic around statistics verification in Parquet. Can you help here?

@shanrd7 is it possible for you to share the file with us? You can set hive.parquet.fail-on-corrupted-statistics to false so bad statistics won't result in query failures. What process writes the file and what version of presto is being used by engine that produced the file?

Was this page helpful?
0 / 5 - 0 ratings