my query is:
SELECT L.co1, L.col2, ..... L.col n
FROM
tab1 AS P
JOIN
(SELECT
DISTINCT C.col1, C.col2, I.col3, I.col4
FROM
tab2 AS I
JOIN
tab3 AS C
ON
I.col1 = C.col1
) AS L
ON
P.col1 = L.col1
where
P.col1 in (1,2,3,4,5, ... ) # long list very slow
GROUP BY
col1, col2, ...
The planning uses more than 10 minutes.
I attached the statistics of filterexpressionrewrite in jmx
after query:
node | failures | hits | time.count | time.max
"master","0","995","7914.0","5550840.655"
before query:
"master","0","994","7909.0","5550840.655"
I also try to reduce the list length and the planning becomes very slow.
I guess that the filter expression rewrite wastes most of the planning time. Is it correct?
Are you sure it's FilterExpressionRewrite? Apparently It was hit only once during query planning. Also time.max=5550840.655ns~5milliseconds, so it seems that FilterExpressionRewrite might not be a problem. Could you post jmx stats for other rules?
From what you posted it looks like this optimizer rule was applied 5 times with a one success. So even if all this rule application took time.max i.e. 5550840.655ns (5sec) it is just 25sec.
Does anything change when you disable iterative optimizer:
set session iterative_optimizer_enabled = false;
What connector do you use?
@sopel39 , Sure, I will post other jmx stats later. What does hit mean in the jmx stats here?
hit is the number of times rule was executed
@kokosing , I already disable it in config file like this:
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8000
query.max-memory=600GB
query.max-memory-per-node=32GB
discovery-server.enabled=true
discovery.uri=http://0.0.0.0:8000
node-scheduler.max-splits-per-node=256
node-scheduler.max-pending-splits-per-task=64
task.concurrency=16
experimental.iterative-optimizer-enabled=false
I am using hive connector (mysql behind).
is your table partitioned on P.col1? If so how many partitions do you have there?
@sopel39 , I see. Thanks. Therefore, I only need to see time.count and time.max and estimate the totol time with time.count * time.max. Is it right?
@kokosing . No, this table is partitioned by another column. The column used for filtering here is just a normal column.
Therefore, I only need to see time.count and time.max and estimate the totol time with time.count * time.max. Is it right?
That's one way to do it. You can also look at the rule execution time quantiles.
Therefore, I only need to see time.count and time.max and estimate the totol time with time.count * time.max. Is it right?
More accurate would be to use time.count * time.avg
Therefore, I only need to see time.count and time.max and estimate the totol time with time.count * time.max. Is it right?
Slight correction. You should use hits delta instead of time.count.
@kokosing @sopel39 Thanks. If I only run one query, should I use (count delta * time.avg) or (hits delta * time.ave) to estimate the time for one rule?
count delta
@findepi @kokosing
Thanks @findepi 's suggestion. I am using asyn-profiler.
The coordinator is doing the planning. I show some hottest profiling code below.
It seems that "visitInListExpression" is very hot because I have a very large list (5000 items) in where in.
Total: 17660184045 (4.52%) samples: 17655
[ 0] java.util.concurrent.ConcurrentLinkedQueue.offer
[ 1] java.util.concurrent.ConcurrentLinkedQueue.add
[ 2] com.google.common.cache.LocalCache$Segment.recordRead
[ 3] com.google.common.cache.LocalCache$Segment.get
[ 4] com.google.common.cache.LocalCache.get
[ 5] com.google.common.cache.LocalCache.getOrLoad
[ 6] com.google.common.cache.LocalCache$LocalLoadingCache.get
[ 7] com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked
[ 8] com.facebook.presto.metadata.FunctionRegistry.getSpecializedFunctionKey
[ 9] com.facebook.presto.metadata.FunctionRegistry.getScalarFunctionImplementation
[10] com.facebook.presto.metadata.FunctionRegistry.getCoercion
[11] com.facebook.presto.metadata.FunctionRegistry.getCoercion
[12] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitGenericLiteral
[13] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitGenericLiteral
[14] com.facebook.presto.sql.tree.GenericLiteral.accept
[15] com.facebook.presto.sql.tree.StackableAstVisitor.process
[16] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.process
[17] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.coerceToSingleType
[18] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitInListExpression
[19] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitInListExpression
[20] com.facebook.presto.sql.tree.InListExpression.accept
[21] com.facebook.presto.sql.tree.StackableAstVisitor.process
[22] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.process
[23] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitInPredicate
[24] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.visitInPredicate
[25] com.facebook.presto.sql.tree.InPredicate.accept
[26] com.facebook.presto.sql.tree.StackableAstVisitor.process
[27] com.facebook.presto.sql.analyzer.ExpressionAnalyzer$Visitor.process
[28] com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyze
[29] com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyzeExpressions
[30] com.facebook.presto.sql.analyzer.ExpressionAnalyzer.analyzeExpressionsWithSymbols
[31] com.facebook.presto.sql.analyzer.ExpressionAnalyzer.getExpressionTypes
[32] com.facebook.presto.sql.analyzer.ExpressionAnalyzer.getExpressionTypes
[33] com.facebook.presto.sql.analyzer.ExpressionAnalyzer.getExpressionTypes
[34] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.shouldPrune
[35] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.lambda$planTableScan$4
[36] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter$$Lambda$1174.1942948442.test
[37] com.facebook.presto.hive.HivePartitionManager.parseValuesAndFilterPartition
[38] com.facebook.presto.hive.HivePartitionManager.getPartitions
[39] com.facebook.presto.hive.HiveMetadata.getTableLayouts
[40] com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableLayouts
[41] com.facebook.presto.metadata.MetadataManager.getLayouts
[42] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planTableScan
[43] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitFilter
[44] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitFilter
[45] com.facebook.presto.sql.planner.plan.FilterNode.accept
[46] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[47] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[48] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[49] com.facebook.presto.sql.planner.plan.ProjectNode.accept
[50] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitJoin
[51] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitJoin
[52] com.facebook.presto.sql.planner.plan.JoinNode.accept
[53] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[54] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[55] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[56] com.facebook.presto.sql.planner.plan.ProjectNode.accept
[57] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[58] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitAggregation
[59] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitAggregation
[60] com.facebook.presto.sql.planner.plan.AggregationNode.accept
[61] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[62] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput
[63] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput
[64] com.facebook.presto.sql.planner.plan.OutputNode.accept
[65] com.facebook.presto.sql.planner.optimizations.AddExchanges.optimize
[66] com.facebook.presto.sql.planner.LogicalPlanner.plan
[67] com.facebook.presto.sql.planner.LogicalPlanner.plan
[68] com.facebook.presto.execution.SqlQueryExecution.doAnalyzeQuery
[69] com.facebook.presto.execution.SqlQueryExecution.analyzeQuery
[70] com.facebook.presto.execution.SqlQueryExecution.start
[71] com.facebook.presto.execution.QueuedExecution.lambda$start$1
[72] com.facebook.presto.execution.QueuedExecution$$Lambda$1043.1362802135.run
[73] java.util.concurrent.ThreadPoolExecutor.runWorker
[74] java.util.concurrent.ThreadPoolExecutor$Worker.run
[75] java.lang.Thread.run
AddExchanges$Rewriter.shouldPrune is on stack. This might be related to https://github.com/prestodb/presto/pull/9871 & https://github.com/prestodb/presto/pull/9879
Especially, the listing above looks like you don't have @martint's recent https://github.com/prestodb/presto/pull/9879 yet. With 0.195 just released, could you verify how this works for you?
@findepi ok, I will try.
Btw, my table does have many partitions. However, we don't have any expression over partition keys.
Do I need to enable hive.table-statistics-enabled if I want to apply the new release?
@hustnn you don't have to. I don't know if/how your problem relates to hive.table-statistics-enabled.
@findepi
My bad, some of my understanding may not be correct. I just start to use presto. Need more time to study its source code.
Thanks for your help. I will try new release first and let you know if it works or not.
@findepi
After upgrade to 0.195, the planning time is significantly reduced (from 12 minutes to 5 minutes).
I will attach the profiling result tomorrow.
@findepi @kokosing
The execution plan has changed a lot. The planning now needs 5 minutes (old version needs more than 10 minutes).
"visitInListExpression" is removed. Is it the reason the query finishes faster?
Total: 1311995960 (2.16%) samples: 1312
[ 0] com.google.common.collect.RegularImmutableMap.get
[ 1] com.google.common.collect.RegularImmutableMap.get
[ 2] com.facebook.presto.sql.planner.ExpressionInterpreter$Visitor.type
[ 3] com.facebook.presto.sql.planner.ExpressionInterpreter$Visitor.visitInPredicate
[ 4] com.facebook.presto.sql.tree.InPredicate.accept
[ 5] com.facebook.presto.sql.tree.AstVisitor.process
[ 6] com.facebook.presto.sql.planner.ExpressionInterpreter$Visitor.visitLogicalBinaryExpression
[ 7] com.facebook.presto.sql.tree.LogicalBinaryExpression.accept
[ 8] com.facebook.presto.sql.tree.AstVisitor.process
[ 9] com.facebook.presto.sql.planner.ExpressionInterpreter.optimize
[10] com.facebook.presto.sql.planner.optimizations.AddExchanges$LayoutConstraintEvaluator.isCandidate
[11] com.facebook.presto.sql.planner.optimizations.AddExchanges$LayoutConstraintEvaluator.access$300
[12] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.lambda$planTableScan$7
[13] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter$$Lambda$1430.2072899892.test
[14] com.facebook.presto.hive.HivePartitionManager.parseValuesAndFilterPartition
[15] com.facebook.presto.hive.HivePartitionManager.getPartitions
[16] com.facebook.presto.hive.HiveMetadata.getTableLayouts
[17] com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableLayouts
[18] com.facebook.presto.metadata.MetadataManager.getLayouts
[19] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planTableScan
[20] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitFilter
[21] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitFilter
[22] com.facebook.presto.sql.planner.plan.FilterNode.accept
[23] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[24] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[25] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[26] com.facebook.presto.sql.planner.plan.ProjectNode.accept
[27] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitJoin
[28] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitJoin
[29] com.facebook.presto.sql.planner.plan.JoinNode.accept
[30] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[31] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[32] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[33] com.facebook.presto.sql.planner.plan.ProjectNode.accept
[34] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[35] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitAggregation
[36] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitAggregation
[37] com.facebook.presto.sql.planner.plan.AggregationNode.accept
[38] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[39] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput
[40] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput
[41] com.facebook.presto.sql.planner.plan.OutputNode.accept
[42] com.facebook.presto.sql.planner.optimizations.AddExchanges.optimize
[43] com.facebook.presto.sql.planner.LogicalPlanner.plan
[44] com.facebook.presto.sql.planner.LogicalPlanner.plan
[45] com.facebook.presto.execution.SqlQueryExecution.doAnalyzeQuery
[46] com.facebook.presto.execution.SqlQueryExecution.analyzeQuery
[47] com.facebook.presto.execution.SqlQueryExecution.start
[48] com.facebook.presto.execution.QueuedExecution.lambda$start$1
[49] com.facebook.presto.execution.QueuedExecution$$Lambda$1260.1957001446.run
[50] java.util.concurrent.ThreadPoolExecutor.runWorker
[51] java.util.concurrent.ThreadPoolExecutor$Worker.run
[52] java.lang.Thread.run
https://github.com/prestodb/presto/issues/9494 i have remove this logic
https://github.com/prestodb/presto/issues/9476
the planning time is nearing the old version
Does your table have many partitions? This is a known issue when tables have many partitions and the expression in the where clause is complex. It鈥檚 due to the expression interpreter used during planning being very inefficient. I have some ideas on how to improve it. I鈥檒l look into it over the next few days.
@martint Yes, it is. The table has many partitions. I remember you mentioned it in 9879.
In the query you posted above, what's the type of tab1.col1, tab2.col1 and tab3.col1? Are the values in the IN list integers?
@martint
All this column are Integer. Only tab1.col1 are used in the "IN" clause.
Both tab1 and tab2 are tables with huge # of partitions.
Profiling result:
Total: 1311995960 (2.16%) samples: 1312
[ 0] com.google.common.collect.RegularImmutableMap.get
[ 1] com.google.common.collect.RegularImmutableMap.get
[ 2] com.facebook.presto.sql.planner.ExpressionInterpreter$Visitor.type
[ 3] com.facebook.presto.sql.planner.ExpressionInterpreter$Visitor.visitInPredicate
[ 4] com.facebook.presto.sql.tree.InPredicate.accept
[ 5] com.facebook.presto.sql.tree.AstVisitor.process
[ 6] com.facebook.presto.sql.planner.ExpressionInterpreter$Visitor.visitLogicalBinaryExpression
[ 7] com.facebook.presto.sql.tree.LogicalBinaryExpression.accept
[ 8] com.facebook.presto.sql.tree.AstVisitor.process
[ 9] com.facebook.presto.sql.planner.ExpressionInterpreter.optimize
[10] com.facebook.presto.sql.planner.optimizations.AddExchanges$LayoutConstraintEvaluator.isCandidate
[11] com.facebook.presto.sql.planner.optimizations.AddExchanges$LayoutConstraintEvaluator.access$300
[12] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.lambda$planTableScan$7
[13] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter$$Lambda$1430.2072899892.test
[14] com.facebook.presto.hive.HivePartitionManager.parseValuesAndFilterPartition
[15] com.facebook.presto.hive.HivePartitionManager.getPartitions
[16] com.facebook.presto.hive.HiveMetadata.getTableLayouts
[17] com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.getTableLayouts
[18] com.facebook.presto.metadata.MetadataManager.getLayouts
[19] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planTableScan
[20] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitFilter
[21] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitFilter
[22] com.facebook.presto.sql.planner.plan.FilterNode.accept
[23] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[24] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[25] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[26] com.facebook.presto.sql.planner.plan.ProjectNode.accept
[27] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitJoin
[28] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitJoin
[29] com.facebook.presto.sql.planner.plan.JoinNode.accept
[30] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[31] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[32] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitProject
[33] com.facebook.presto.sql.planner.plan.ProjectNode.accept
[34] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[35] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitAggregation
[36] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitAggregation
[37] com.facebook.presto.sql.planner.plan.AggregationNode.accept
[38] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.planChild
[39] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput
[40] com.facebook.presto.sql.planner.optimizations.AddExchanges$Rewriter.visitOutput
[41] com.facebook.presto.sql.planner.plan.OutputNode.accept
[42] com.facebook.presto.sql.planner.optimizations.AddExchanges.optimize
[43] com.facebook.presto.sql.planner.LogicalPlanner.plan
[44] com.facebook.presto.sql.planner.LogicalPlanner.plan
[45] com.facebook.presto.execution.SqlQueryExecution.doAnalyzeQuery
[46] com.facebook.presto.execution.SqlQueryExecution.analyzeQuery
[47] com.facebook.presto.execution.SqlQueryExecution.start
[48] com.facebook.presto.execution.QueuedExecution.lambda$start$1
[49] com.facebook.presto.execution.QueuedExecution$$Lambda$1260.1957001446.run
[50] java.util.concurrent.ThreadPoolExecutor.runWorker
[51] java.util.concurrent.ThreadPoolExecutor$Worker.run
[52] java.lang.Thread.run
@martint
Do you begin to work on this optimization? Any idea on it? Thanks.
@kokosing @martint Do you think that compiling partition filter function in AddExchanges.LayoutConstraintEvaluator would help?
We did profiling of another case with information schema and huge IN predicate and type/function resolution methods were consuming a lot of time. Those methods are called on each ExpressionInterpreter.optimize call. With compiled filter function this shouldn't be a problem (unless there are some casts involved on IN elements)
In many cases where other conjuncts are highly filtering this https://github.com/prestodb/presto/commit/7125e0ceb3e11e9288bfa017b39f13c5251b7665 could be already helpful.
IMO using compiled filter would be helpful. Notice that here you call predicate in row-by-row manner, not page-by-page, but this should not be a problem.
Is predicate compilation expensive operation? Maybe it is not worth to compile in case of low number of partitions? What do you think?
Do you think that compiling partition filter function in AddExchanges.LayoutConstraintEvaluator would help?
Maybe it is not worth to compile in case of low number of partitions?
We could eg JIT it after few calls.
Maybe after https://github.com/prestodb/presto/pull/10356 we could just avoid huge IN list creation in hive connector? There is not much value in it anyway.
I think compiling those expressions might be messy. It would require operating on symbols+values vs pages+channels. Also, there鈥檚 no way to deal with the behavior where some part of the expression may be left unresolved for a given binding.
I started looking into how to optimize the interpreter but didn鈥檛 get too far. There are plenty of opportunities, like caching the computed value of constant subtrees, caching other metadata that gets resolved on every invocation (resolving functions and operators), etc.
As discussed with @findepi compilation takes time too so for most queries it will be cost rather then benefit. Also there is tuple domain which can probably filter most of the partitions usually.
@hustnn You can try to workaround this problem by writing this query like this:
P.col1 in (values 1,2,3,4,5, ... ) # long list very slow
Notice the values keyword. It looks like it prevents building the whole tree, with operator for each literal in the in clause and is performing the filtering on presto side.
Most helpful comment
@hustnn You can try to workaround this problem by writing this query like this:
Notice the
valueskeyword. It looks like it prevents building the whole tree, with operator for each literal in theinclause and is performing the filtering on presto side.