Presto: Spark-created Iceberg tables may be omitted in SHOW TABLES

Created on 26 Sep 2019  Â·  7Comments  Â·  Source: prestosql/presto

All 7 comments

Is it expected that the property table is case insensitive (and in fact is mixed upper and lower in the wild)?

Assuming that queuing the metastore with a case insensitive filter is not possible, one solution is to query both for upper and lower case (no mixed case — too bad).

AFAIK, Iceberg has no spec item for this (key/value of the table type property). Personally I think we should enforce Iceberg writers to use the key/value pair specified in BaseMetastoreTableOperations and we might want to include this in Iceberg's spec.

The current Iceberg spec (see doRefresh() and setParameters() methods of iceberg HiveTableOperations class) assumes that table type is case-insensitive. It is not clear why it is necessary to make the assumption, I guess, different metastore implementations may use different comparisons and Iceberg code followed case-insensitive behavior of one of metastore in use.

I think there are 3 ways to handle the issue:

  1. Discuss with the Iceberg community reasons to support case-insensitive behavior and propose changing table type to be case-sensitive (all upper case to preserve current setParameters() behavior).
  2. Continue to support case-insensitive behavior and avoid relying on metastore comparison behavior. This will require presto to query metastore for all table types and filter iceberg tables locally.
  3. Follow David suggestion and limit support for lower case or upper case table type only (no mixed case). This will require two requests to metastore and a proper merge of the returned results that may be slower than approach 2 due to 2 round trips.

As a short term fix, I'd suggest to follow setParameters() method and use ICEBERG_TABLE_TYPE_VALUE.toUpperCase(Locale.ENGLISH) in listTables() and while creating iceberg tables in presto.

@vrozov Thanks for looking into this. I think the suggested short-term fix makes sense. In the long term, I favor approach 1. I think we should add a "catalog" section to Iceberg's spec (https://iceberg.apache.org/spec/).

Looping back on this.

Currently Spark creates Iceberg tables with<table_type, ICEBERG> (see here) but Presto creates with <table_type, iceberg> (see here).

When querying a specific table, both Spark and Presto compares the table type case-insensitively so they're fine. But when executing SHOW TABLES queries, Presto uses iceberg in a case-sensitive filter to filter out non-Iceberg tables, so tables with <table_type, ICEBERG> are being omitted.

As @vrozov and @electrum mentioned, there are two changes to make on the Presto side:

  1. When creating Iceberg tables, I think we should use upper-cased ICEBERG to be consistent with Spark.
  2. When executing SHOW TABLES, Presto should at least support both upper case and lower case as @electrum suggested. In terms of implementation, we should decide whether to choose approach 2 or approach 3 that @vrozov proposed.

@electrum @vrozov @findepi What do you think?

ccing @rdsr and @rdblue for visibility

Sounds good to me.

If we do _When creating Iceberg tables, I think we should use upper-cased ICEBERG to be consistent with Spark_
do we still need to change SHOW TABLES flow?

If we are concerned about _existing_ tables registered with lowercase iceberg, then doing _When creating Iceberg tables, I think we should use upper-cased ICEBERG to be consistent with Spark_ does not buy us a lot.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dpolonsky picture dpolonsky  Â·  4Comments

theoretical-olive picture theoretical-olive  Â·  5Comments

yourmain picture yourmain  Â·  4Comments

BruceKellan picture BruceKellan  Â·  4Comments

JamesRTaylor picture JamesRTaylor  Â·  5Comments