Currently, the output of explain <query> may change even if neither query nor environment changed. There are at least two causes:
UUIDs printed in TableCommit[catalog:HiveTransactionHandle{uuid=19a6c34d-9ab8-4e16-8d93-b010d244be66}...
Partition columns printed in random order (due to hiveColumnIndex = -1)
HiveColumnHandle{name=ts, hiveType=string, hiveColumnIndex=-1, columnType=PARTITION_KEY, comment=}
HiveColumnHandle{name=ds, hiveType=string, hiveColumnIndex=-1, columnType=PARTITION_KEY, comment=YYYY-MM-DD}
This makes it difficult to compare plans with and without CBO.
Hence, let's make the order of partition columns deterministic and remove UUIDs from the plan.
CC: @rschlussel
Hi, I'll be working on this issue under @rschlussel
Thank you, @natewoodfb
@mbasmanova For testing the UUID change, is it only printed when a transaction is performed? I'm having difficulty reproducing the issue.
FWIW, having the transaction id in the plan was by design. The transaction id is part of the ephemeral identity of a table that's being updated.
For example, a sequence of:
START TRANSACTION;
CREATE TABLE foo (a bigint);
EXPLAIN INSERT INTO foo VALUES (1);
EXPLAIN INSERT INTO foo VALUES (1);
will produce the same plan for both EXPLAIN INSERTs, which refer to a table state that's only visible within that transaction.
@martint Martin, thanks for sharing this piece of context. Without an explicit transaction, I imagine Presto creates new transaction for every EXPLAIN query. Hence, the query plans differ.
Do you object to removing the transaction UUID from the plan or simply explaining how it made it there in the first place?
We can remove the transaction id from the plan, but that means giving up some ability to debug things when using transactions. The output of EXPLAIN was always meant for human consumption, so ideally we should not be relying on it too much for comparing plans programmatically.
@martint that seems like a very specific and infrequently needed use case. Most of the time the uuid doesn't provide any useful information. What if we left the uuid in for EXPLAIN VERBOSE so that transactions can still be debugged, but left them out of regular explains? It's very valuable to be able to check if plans changed between query runs. It allows us to run tests on a much smaller set of queries.
(also, only suggesting leaving it in for EXPLAIN VERBOSE if you think it would be valuable. otherwise, i'm a fan of simplifying and removing it altogether)
I will be working on "column order" part of the issue.
Another item: The collected statistics for the table writer and table commit are unordered.
Most helpful comment
Hi, I'll be working on this issue under @rschlussel