Hello,
I'm pretty new to prestoDB usage but was not able to find any answer in doc or others issues.
When I do some pretty basic request through presto like
select field1,max(field2) from table group by 1 with table which has several millions lines.
i have some requests which will take a few seconds directly on postgresql but when using prestodb, it's several minutes. Checking the presto live plan, I see that presto will request all the data from table to next do the max aggregation.
Can't we force presto to request the aggregated data directly from pg ?
thx!
PS: using prestoDB 0.222, didn't find any bug related fixed in more recent versions.
Seems to be related to query plan push down. cc @highker @mbasmanova
@sachdevs will work on plan pushdown for JDBC connectors
@thomasLeclaire Your observations are correct. At the moment, Presto is not able to push down complex operations such as aggregations or joins into the data source. Hence, it read all the data, then aggregates on its own. @highker added infrastructure to enable pushdown of any part of the plan, but we are still missing support from the connectors. Specifically, Postgres connector needs to be modified to add support for pushing down operations. Let us know if you are interested in working on that.
I will update this issue once we publish the API for plan pushdown in connectors.
Most helpful comment
@sachdevs will work on plan pushdown for JDBC connectors