AWS glue began supporting statistics recently. See this Glue API documentation: https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/get-column-statistics-for-partition.html
Presto should be able to support statistics when using Glue now
Yep, this would be a great feature to Presto
It would be great to have statistics support for glue
I have looked into the codebase to see how to get the Glue partition statistics and found DisabledGlueColumnStatisticsProvider which was added in https://github.com/prestosql/presto/commit/7fe264725dd20c551637f4d5a4803b7414382722. It looks that was added to extend GlueColumnStatisticsProvider to support Glue column statistics when it's available. Is this assumption correct?
Is this assumption correct?
Yes. Please use GlueColumnStatisticsProvider. Later we can remove that extension altogether.
Also noticed PrestoDB had a PR to support this:
https://github.com/prestodb/presto/pull/14947
The pull request that @ckdarby linked looks pretty complete (even if it hasn't been reviewed yet), are there any major implementation differences between prestosql and prestodb that we should be aware of before using it as reference ?
@hsuchenc (the author of that PR) is from Amazon and I would be surprised if they didn't want to contribute Glue stats support here to Presto.
@hsuchenc, apparently here is much more interest in your PR than elsewhere 馃槈
Are you planning on making a contribution?
My hands are pretty tied, so feel free to port it from PrestoDB to PrestoSQL.
Thank you.
On behalf of my company I'm working on the implementation of this feature following the @hsuchenc pull request as reference
Most helpful comment
On behalf of my company I'm working on the implementation of this feature following the @hsuchenc pull request as reference