Presto: Query fails when there is column type mismatch between partition schema and table schema

Created on 1 Feb 2019  路  10Comments  路  Source: prestosql/presto

Presto is failing to read the parquet partitions if the decimal datatype don't match with what is in the hive metastore. Here is the error:

Query 20190130_224317_00018_w9d29 failed: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'sbnum' in table 'default.presto_test' is declared as type 'decimal(8,0)', but partition 'month=201812' declared column 'sbnum' as type 'decimal(6,0)'.
com.facebook.presto.spi.PrestoException: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'sbnum' in table 'default.presto_test' is declared as type 'decimal(8,0)', but partition 'month=201812' declared column 'sbnum' as type 'decimal(6,0)'.
                at com.facebook.presto.hive.HiveSplitManager.lambda$getPartitionMetadata$2(HiveSplitManager.java:315)
                at com.google.common.collect.Iterators$6.transform(Iterators.java:788)
                at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
                at com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
                at com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1340)
                at com.facebook.presto.hive.ConcurrentLazyQueue.poll(ConcurrentLazyQueue.java:37)
                at com.facebook.presto.hive.BackgroundHiveSplitLoader.loadSplits(BackgroundHiveSplitLoader.java:252)
                at com.facebook.presto.hive.BackgroundHiveSplitLoader.access$300(BackgroundHiveSplitLoader.java:91)
                at com.facebook.presto.hive.BackgroundHiveSplitLoader$HiveSplitLoaderTask.process(BackgroundHiveSplitLoader.java:185)
                at com.facebook.presto.hive.util.ResumableTasks.safeProcessTask(ResumableTasks.java:47)
                at com.facebook.presto.hive.util.ResumableTasks.access$000(ResumableTasks.java:20)
                at com.facebook.presto.hive.util.ResumableTasks$1.run(ResumableTasks.java:35)
                at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:78)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

Here are the steps to reproduce this issue -

  1. Create partitioned table -
CREATE external TABLE `presto_test`(
  `sbnum` decimal(6,0),
  `abnum` decimal(8,0)
)
partitioned by (
month int
)
STORED AS PARQUET
LOCATION
  '/hive/presto_test'
;

2.Insert data into this table, create few partitions -

insert overwrite table presto_test partition (month=201801)
select sbnum, abnum from <tabname> limit 10
;

insert overwrite table presto_test partition (month=201802)
select sbnum, abnum from <tabname> limit 10
;

3.Access the table from Presto to ensure it works -

select count(1) from presto_test ;

4.alter the table, change the data type -

alter table presto_test change sbnum sbnum decimal (8,0) ;

5.Access the table again from presto -

select count(1) from presto_test ; (should throw an error)

6.Insert 1 more partition -

insert overwrite table presto_test partition (month=201803)
select sbnum, abnum from <tabname> limit 10
;

7.Access the table from presto again -

select count(1) from presto_test where month = 201802 ; (should work)
bug

All 10 comments

@dain or anyone willing to work on it, there is some old PR by @losipiuk solving that (https://github.com/prestodb/presto/pull/9422).
Unfortunately it wasn't merged, but one can draw some inspiration from it.

Hi @findepi . I am interested in it. Can I take it by referring @losipiuk 's previous work?

@hustnn sure, go ahead!

@findepi Do you know why the previous PR not merged? Is there any thing I need to note?

@dain @findepi
I also notes the previous RP try to solve the decimal conversion, decimal vs double conversion and decimal vs float conversion.

Should I consider them all or only decimal conversion for this issue only?

Should I consider them all or only decimal conversion for this issue only?

My _guess_ is that each of these conversions is an issue today. I may be wrong though.

Do you know why the previous PR not merged? Is there any thing I need to note?

I regret but i don't know. @martint you were reviewing that, were there any fundamental issues? Like spec compliance?

were there any fundamental issues? L

No, I think it just fell through the cracks.

Submit the initial PR trying to solving this issue https://github.com/prestosql/presto/pull/352.

The comment and suggestion are welcome.

Fixed in 310

Was this page helpful?
0 / 5 - 0 ratings