Presto: Huge time for aggregation in postgresql

Created on 24 Sep 2019 · 4Comments · Source: prestodb/presto

Hello,

I'm pretty new to prestoDB usage but was not able to find any answer in doc or others issues.

When I do some pretty basic request through presto like
select field1,max(field2) from table group by 1 with table which has several millions lines.

i have some requests which will take a few seconds directly on postgresql but when using prestodb, it's several minutes. Checking the presto live plan, I see that presto will request all the data from table to next do the max aggregation.
Can't we force presto to request the aggregated data directly from pg ?

thx!
PS: using prestoDB 0.222, didn't find any bug related fixed in more recent versions.

Source

thomasLeclaire

👍3

Most helpful comment

@sachdevs will work on plan pushdown for JDBC connectors

highker on 24 Sep 2019

👍2

All 4 comments

Seems to be related to query plan push down. cc @highker @mbasmanova

shixuan-fan on 24 Sep 2019

@sachdevs will work on plan pushdown for JDBC connectors

highker on 24 Sep 2019

👍2

@thomasLeclaire Your observations are correct. At the moment, Presto is not able to push down complex operations such as aggregations or joins into the data source. Hence, it read all the data, then aggregates on its own. @highker added infrastructure to enable pushdown of any part of the plan, but we are still missing support from the connectors. Specifically, Postgres connector needs to be modified to add support for pushing down operations. Let us know if you are interested in working on that.