presto version:0.201
the difference between table and view ,there is a “Aggregate(PARTIAL)” in table query
the sql to create view :
CREATE VIEW hive.db_benchmarksd0121.event_vd_test AS
SELECT * FROM hive.db_benchmarksd0121.event
UNION ALL
SELECT * FROM hive.db_benchmarksd0121.empty_event
performence:
with view:
0:23 [112M rows, 299MB] [4.98M rows/s, 13.3MB/s]
with table:
0:08 [112M rows, 299MB] [14.1M rows/s, 37.6MB/s]
explain analyze detail:
select_from_table_aggregate.log
I’m planning to work on this. Can anyone assign it to me please ?
hi,i found some clue, PushPartialAggregationThroughExchange's PATTERN
private static final Pattern
.with(source().matching(exchange().capturedAs(EXCHANGE_NODE)));
only work with this .
Aggregate
--ExchageNode
----MarkDistinct
but when select from view(union all) ,the pattern not work.
Aggregate
--MarkDistinct
any suggestion? shoud i add ExchangeNode when MultipleDistinctAggregationToMarkDistinct worked. or and new rule like "PushPartialAggregationThroughMarkDistinct"
@jiangzhx I guess we can generate similar plan (like the one generated for table) for the above query if we replace
Set<Symbol> common = Sets.intersection(partitioningColumns, parent.partitioningColumns);
return common.isEmpty() ? this : partitioned(common).withNullsAndAnyReplicated(nullsAndAnyReplicated);
in PreferredProperties line no 373 with this
partitioned(partitioningColumns).withNullsAndAnyReplicated(nullsAndAnyReplicated);
It solves the problem of skewing but brings back the problem of shuffling the data.
@martint , @findepi Your insights on this ?
Plus there is an issue on hash value generated twice for (once after the tableScan and other is after the remote source) and I am currently working on it. Will raise a PR for the same by this week.
@Praveen2112
I merge your commits and the test is complete.
Your commits solved this Issue.
explain analyze detail:
groupby_view_explain.txt
The time spent querying from the view is basically the same as the time spent querying from the table.
Thank you!
Most helpful comment
Plus there is an issue on hash value generated twice for (once after the tableScan and other is after the remote source) and I am currently working on it. Will raise a PR for the same by this week.