Presto: Issue with parquet type annotations being ignored

Created on 12 Jun 2017  路  4Comments  路  Source: prestodb/presto

I have a parquet file with the following schema (in hive catalogue):

hadoop jar ~/parquet-tools-1.9.0.jar schema file:///$(pwd)/00000.parquet 
message schema {
  optional int64 FIELDA;
  optional int64 FIELDB;
  optional int32 FIELDC (INT_16);
  optional int32 FIELDD (INT_8);
  optional int64 FIELDF (TIMESTAMP_MILLIS);
  optional int64 FIELDG (TIMESTAMP_MILLIS);
  optional int64 FIELDH;
}

and I placed the following schema on top of it:

CREATE MYTABLE
(
    FIELDA BIGINT,
    FIELDB BIGINT,
    FIELDC SMALLINT,
    FIELDD TINYINT,
    FIELDF TIMESTAMP,
    FIELDG TIMESTAMP,
    FIELDH BIGINT
) with (external_location = 's3a://some_path')

If I query it like so

select fieldc, fieldd, count(*) from mytable group by 1,2

the query runs successfully. But If I try to filter by fieldc or fieldd`, I see the following error:

java.sql.SQLException: Query failed : Error opening Hive split s3a://some_path/00001.parquet : Mismatched Domain types: tinyint vs integer

If I change the external definition to integer, it works fine. Seems to me that the type annotation is being ignored, but, why does it only affect filter operations?

Presto 0.175

Most helpful comment

We are seeing similar issues as described in this in Presto 0.181, Mismatched Domain types: date vs integer in filter statements even though both fields are of type date. Casting the date column as a date solves the problem though.

All 4 comments

We are seeing similar issues as described in this in Presto 0.181, Mismatched Domain types: date vs integer in filter statements even though both fields are of type date. Casting the date column as a date solves the problem though.

@drdee I think this fix here - https://github.com/prestodb/presto/pull/10181 - should fix the issue you're seeing and the original issue is probably along the same lines but I didn't get to dig into that further. Also adds predicate pushdown for dates.

@drdee I came across the same issue , and tried casting both the fields to date and still it did not work.
However, on casting both sides to timestamp the query is working. Can anyone please help me with this.

I faced the same issue with a date type column, casting to date did make the query run, though.

Was this page helpful?
0 / 5 - 0 ratings