Presto: Partial Aggregation not being pushed through UNION ALL

Created on 2 Apr 2019  ·  5Comments  ·  Source: prestosql/presto

presto version:0.201
the difference between table and view ,there is a “Aggregate(PARTIAL)” in table query

the sql to create view :

CREATE VIEW hive.db_benchmarksd0121.event_vd_test AS
SELECT * FROM  hive.db_benchmarksd0121.event
UNION ALL
SELECT * FROM  hive.db_benchmarksd0121.empty_event

performence:
with view:
0:23 [112M rows, 299MB] [4.98M rows/s, 13.3MB/s]
with table:
0:08 [112M rows, 299MB] [14.1M rows/s, 37.6MB/s]

explain analyze detail:
select_from_table_aggregate.log

select_from_view_aggregate.log

Most helpful comment

Plus there is an issue on hash value generated twice for (once after the tableScan and other is after the remote source) and I am currently working on it. Will raise a PR for the same by this week.

All 5 comments

I’m planning to work on this. Can anyone assign it to me please ?

hi,i found some clue, PushPartialAggregationThroughExchange's PATTERN
private static final Pattern PATTERN = aggregation()
.with(source().matching(exchange().capturedAs(EXCHANGE_NODE)));

only work with this .
Aggregate
--ExchageNode
----MarkDistinct

but when select from view(union all) ,the pattern not work.
Aggregate
--MarkDistinct

any suggestion? shoud i add ExchangeNode when MultipleDistinctAggregationToMarkDistinct worked. or and new rule like "PushPartialAggregationThroughMarkDistinct"

@jiangzhx I guess we can generate similar plan (like the one generated for table) for the above query if we replace

Set<Symbol> common = Sets.intersection(partitioningColumns, parent.partitioningColumns);
return common.isEmpty() ? this : partitioned(common).withNullsAndAnyReplicated(nullsAndAnyReplicated);

in PreferredProperties line no 373 with this

partitioned(partitioningColumns).withNullsAndAnyReplicated(nullsAndAnyReplicated);

It solves the problem of skewing but brings back the problem of shuffling the data.

@martint , @findepi Your insights on this ?

Plus there is an issue on hash value generated twice for (once after the tableScan and other is after the remote source) and I am currently working on it. Will raise a PR for the same by this week.

@Praveen2112
I merge your commits and the test is complete.
Your commits solved this Issue.
explain analyze detail:
groupby_view_explain.txt

groupyd_table_explain.txt

The time spent querying from the view is basically the same as the time spent querying from the table.

Thank you!

Was this page helpful?
0 / 5 - 0 ratings