When running a query with spilling enabled we get an error:
"Query 20170731_153056_00334_yexgm failed: Failed to read spilled pages"
We're running this on EMR 5.7.0 which has Presto 0.170. What could be possible reasons? What information I could provide to help investigate this? All params are default as set by EMR 5.7.0 but I can provide specifics if needed.
Thank you
It would be useful to see the stack trace of the worker that failed to read the spilled pages. You can find the failed task from the web ui and then go to the worker running that task and get the logs.
@nezihyigitbasi here's the stack trace:
com.facebook.presto.spi.PrestoException: Failed to read spilled pages
at com.facebook.presto.spiller.BinaryFileSpiller.readPages(BinaryFileSpiller.java:120)
at com.facebook.presto.spiller.BinaryFileSpiller.lambda$getSpills$1(BinaryFileSpiller.java:108)
at java.util.stream.IntPipeline$4$1.accept(IntPipeline.java:250)
at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
at java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:693)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at com.facebook.presto.spiller.BinaryFileSpiller.getSpills(BinaryFileSpiller.java:109)
at com.facebook.presto.operator.aggregation.builder.SpillableHashAggregationBuilder.mergeFromDisk(SpillableHashAggregationBuilder.java:256)
at com.facebook.presto.operator.aggregation.builder.SpillableHashAggregationBuilder.buildResult(SpillableHashAggregationBuilder.java:187)
at com.facebook.presto.operator.HashAggregationOperator.getOutput(HashAggregationOperator.java:438)
at com.facebook.presto.operator.Driver.processInternal(Driver.java:303)
at com.facebook.presto.operator.Driver.lambda$processFor$6(Driver.java:234)
at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:537)
at com.facebook.presto.operator.Driver.processFor(Driver.java:229)
at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:622)
at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:624)
at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:776)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /tmp/presto/spills/presto-spill1467464677085147065/140.bin (Too many open files)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at com.facebook.presto.spiller.BinaryFileSpiller.readPages(BinaryFileSpiller.java:115)
... 23 more
Seems it's too many open files. I think this is most likely related. https://groups.google.com/forum/#!topic/presto-users/-N_-5-J0cSE .. Will try to figure out how to increase the limit on EMR to see if the issue goes away. Thank you for your help! Will close if it works fine.
Yeah that's the reason: Too many open files. Bump up the max file descriptors limit as mentioned in that e-mail and please re-open if that doesn't solve your problem.
Adding
presto soft nofile 32768
presto hard nofile 65536
presto soft nproc 32768
presto hard nproc 65536
to /etc/security/limits.conf on every node solved our issue. On EMR this can be done with bootstrap actions script.
Most helpful comment
Adding
presto soft nofile 32768
presto hard nofile 65536
presto soft nproc 32768
presto hard nproc 65536
to /etc/security/limits.conf on every node solved our issue. On EMR this can be done with bootstrap actions script.