In Presto 0.188 with iterative optimizer, we are observing "The optimizer exhausted the time limit" error for certain types of queries, which usually contain many operations (more than 2000 nodes) and table references.
We've also tried to increase experimental.iterative-optimizer-timeout from 3m to 10m, but it was not sufficient. Here is a copy of the stack trace:
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: com.facebook.presto.spi.PrestoException: The optimizer exhausted the time limit of 180000 ms
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreNode(IterativeOptimizer.java:118)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:93)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreChildren(IterativeOptimizer.java:182)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.exploreGroup(IterativeOptimizer.java:95)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.iterative.IterativeOptimizer.optimize(IterativeOptimizer.java:84)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:133)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.sql.planner.LogicalPlanner.plan(LogicalPlanner.java:122)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.execution.SqlQueryExecution.doAnalyzeQuery(SqlQueryExecution.java:366)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.execution.SqlQueryExecution.analyzeQuery(SqlQueryExecution.java:345)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.facebook.presto.execution.SqlQueryExecution.start(SqlQueryExecution.java:277)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at com.treasure_data.presto.server.TDQueryExecution.start(TDQueryExecution.java:118)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Nov 17 01:38:56 ip-172-18-132-23 presto[37566]: #011at java.lang.Thread.run(Thread.java:745)
We cannot expose this specific query (because this is our customer's query), but we are now trying to create an example that causes this long optimization time. By disabling iterative-optimizer, the query runs normally.
Can you please attach the profiller? The most likely IterativeOptimizer stuck in a cycle, so profiller log should point out which rules participate in cycle.
Just for test reason, can you please increase experimental.iterative-optimizer-timeout to something like an hour, to see if IterativeOptimizer ends eventually. That would mean there is no cycle, but there is something to optimize in optimizer.
@kokosing Yes. Some queries actually started by setting a longer timeout. So it's not a cycle issue. Our guess is excessive use of copyOf causes planning performance issue for large queries, or if a table has thousands of columns, even a simple select * from (table) query can cause this issue.
Yes, I have heard of this issue of excessive usage of copyOf.
I will try to reproduce this.
I played a bit with this and I did not found any easy solution, maybe it is because I had simple test cases. I saw couple of places which could be optimized, but it did not help much. Queries are planed within a seconds, it is not fast, but it is not a blocker.
@xerial Can you please play with https://github.com/prestodb/presto/pull/9376 in order to find a query which reproduces your issue?
FYI @Praveen2112
@xerial Could you post a query that causes this timeout in order to make it easier to reproduce?
Thanks for the PR. We also need to find a simpler query to reproduce this issue, but let me try.
@xerial, what do the JMX RuleStats counters say before and after the query runs? I.e., which rules contribute the most to execution time and which fire most often?
@martint I haven't checked that counter. Let me check it.
PickTableLayoutForPredicate looks dominant player. I suspect very large Hits or Time.Count
Name Value Type Display Name Update Interval Description
Failures 0 long Failures -1
Hits 1098242 long Hits -1
Time.Count 1098242.0 double Time.Count -1
Time.Max 6291.455 double Time.Max -1
Time.MaxError 0.004584599751238798 double Time.MaxError -1
Time.Min 100.352 double Time.Min -1
Time.P50 122.367 double Time.P50 -1
Time.P75 142.847 double Time.P75 -1
Time.P90 196.607 double Time.P90 -1
Time.P95 230.143 double Time.P95 -1
Time.P99 327.679 double Time.P99 -1
Time.Percentiles java.util.LinkedHashMap, size 100 java.util.Map Time.Percentiles -1
Time.Unit MICROSECONDS java.util.concurrent.TimeUnit Time.Unit -1
Attaching EXPLAIN result with set session iterative_optimizer_enabled = false
explain.txt
@miniway are you diagnosing the same problem as reported by @xerial or just a similar one?
PickTableLayout (the class containing PickTableLayoutForPredicate) is accused of being infinite-loop-prone (https://github.com/prestodb/presto/pull/8735#issuecomment-355852977)
Looking at the current code of PickTableLayoutForPredicate there was a chance that rule could loop forever. Here is a fix: https://github.com/prestodb/presto/pull/9705
So it has to be different issue as @xerial said that increasing timeout helped.
Anyway @xerial and @miniway can you please guys somehow check if #9705 fixes your issue?
It鈥檚 the same problem that @xerial reported. I鈥檓 going to test the fix. Thanks
I confirmed that @kokosing #9705 has solved the issue!
https://github.com/prestodb/presto/pull/9705 is merged. Let me close this issue.
Thanks, I thought it was going to be closed automatically as I used magic work Fixes in the PR comment.
Most helpful comment
I confirmed that @kokosing #9705 has solved the issue!