When using AWS as the executor for a workflow that will run 50 samples in parallel, nextflow stalls at the 'warming up stage' and no jobs are submitted to AWS batch.
If the same workflow is run with 2 samples as input, the workflows runs successfully. If the workflow for 50 samples is now resubmitted then the workflow for the larger sample set completes successfully.
N E X T F L O W ~ version 19.01.0
Launching `assembly.nf` [jolly_chandrasekhar] - revision: ceb551d17c
[warm up] executor > local
Stalls at this stage
I can confirm this is consistent behaviour across multiple pipelines. 'Setting a spark' with a few files will the 'light the fire' for a larger batch
One thing to try is make sure the "desired vCPUs" is not set to 0 in the compute environment. I have found it is often stalled on 0.
Thanks @evanfloden I did have "desired vCPUs" is set to 0. I have set this to 4 and will check if this makes a difference
Unfortunately this isn't an option since setting desired vCPUs to 4 means that a machine with 4 CPUs will be constantly running. Batch actually changes the desired vCPU between the min and max CPUs value based on the number of jobs
I'm not sure that this is correct. I think you are describing min vCPUs.
https://docs.aws.amazon.com/batch/latest/userguide/compute_environment_parameters.html
Batch does indeed change the desired vCPUs dynamically.
I had observed that by setting the desired vCPUs to a number above zero "gets the fire started" so to say.
My previous comment was based on testing this on batch a couple of days ago.
Set desired vCPUs to 4 - a spot request fired up and EC2 instances were started up to 4 vCPUs. These were never destroyed UNTIL I reset desired vCPUs to 0.
Therefore EC2 instances were running 24/7 which is not desirable for a pay as you go model
Yikes! That is good to know.
I understand that the instances are only started when needed and are supposed to shut down when there are no more jobs in the queue.
I'll let you know if I find a more definitive answer in my testing.
I think that if you add desired vCPUS to 4 that that will become the default for the compute environment. Please let me know if you find different. I'm definitely an AWS noob
Trying this but I'm getting
$ aws s3 ls s3://nextflow-datasets/52-samples/fastqs
An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied
@pditommaso
I've made the bucket listable now. It should be OK but let me know if not
Ok, now I can access it 👍
Beautiful! It hangs when resolving the job definition!
"AWSBatch-executor-9" #50 prio=5 os_prio=31 tid=0x00007fcf83490000 nid=0x14c03 waiting for monitor entry [0x0000700010fbc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.resolveJobDefinition(AwsBatchTaskHandler.groovy:351)
- waiting to lock <0x00000006c6508ac8> (a java.util.LinkedHashMap)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.getJobDefinition(AwsBatchTaskHandler.groovy:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.newSubmitRequest(AwsBatchTaskHandler.groovy:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
and
"AWSBatch-executor-8" #49 prio=5 os_prio=31 tid=0x00007fcf8348e800 nid=0xb303 waiting on condition [0x0000700010eb7000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000006ceb91568> (a nextflow.util.ThrottlingExecutor$PriorityTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at nextflow.util.ThrottlingExecutor.doInvoke1(ThrottlingExecutor.groovy:679)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:104)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:326)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1011)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:994)
at nextflow.util.ClientProxyThrottler.invokeMethod(ClientProxyThrottler.groovy:88)
at org.codehaus.groovy.runtime.callsite.PogoInterceptableSite.call(PogoInterceptableSite.java:47)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.createJobDef(AwsBatchTaskHandler.groovy:443)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:156)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.resolveJobDefinition(AwsBatchTaskHandler.groovy:361)
- locked <0x00000006c6508ac8> (a java.util.LinkedHashMap)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:156)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.getJobDefinition(AwsBatchTaskHandler.groovy:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:156)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.newSubmitRequest(AwsBatchTaskHandler.groovy:480)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:156)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:168)
at nextflow.cloud.aws.batch.AwsBatchTaskHandler.submit(AwsBatchTaskHandler.groovy:284)
at nextflow.processor.TaskPollingMonitor.submit(TaskPollingMonitor.groovy:206)
at nextflow.processor.ParallelPollingMonitor.submit0(ParallelPollingMonitor.groovy:64)
at nextflow.processor.ParallelPollingMonitor$1.invoke(ParallelPollingMonitor.groovy:73)
at nextflow.util.ThrottlingExecutor$Recoverable.call(ThrottlingExecutor.groovy:400)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Sounds promising. Does that help suggest a potential solution??
Deadlocks are my fav bugs ;) I'll keep you posted.
Strange that if you 'light the fire' with 2 samples the 52 then run fine
Squashed. In a nutshell aws api calls are submitted through a thread poll in order to throttle the submission rate, however a nested api requested inside a synchronised block along with the large number of requested was causing a starvation condition (exhausting all threads in the pool so the response was never returned ...).
At some point I need to review this logic, but the patch should solve the problem. Thanks for reporting the issue, it was a bad one. For the future, when the execution hangs use the jstack tool reporting the printed output.
(will uploaded a patched build in the following days)
Whoo excited for this! I'm running into the same issue where I prepared (40 parameter sets) x (1500 samples) = 60,000 jobs and nextflow run seems to not do anything:
(kmer-hashing)
✘ Wed 20 Mar - 19:07 ~/code/nf-kmer-similarity/from-csvs origin ☊ master ✔ 4☀
make run_aws
nextflow run main.nf \
-work-dir s3://olgabot-maca/nextflow-workdir-test/ \
-bucket-dir s3://olgabot-maca/nextflow-bucket-dir-test/ \
-with-trace -with-timeline -with-dag -with-report -latest -resume
N E X T F L O W ~ version 19.01.0
Launching `main.nf` [loving_wilson] - revision: 0399fe8141
[warm up] executor > awsbatch
But looking at htop, there's definitely something going on

Oddly enough, there is activity in the nextflow.log file as well:
Mar-22 11:57:59.333 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] sourmash_compute_sketch
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (queue) OPEN ; channel: -
port 4: (cntrl) - ; channel: $
[process] sourmash_compare_sketches
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (value) OPEN ; channel: __$fileinparam<3>
port 4: (cntrl) - ; channel: $
Mar-22 12:02:59.340 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] sourmash_compute_sketch
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (queue) OPEN ; channel: -
port 4: (cntrl) - ; channel: $
[process] sourmash_compare_sketches
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (value) OPEN ; channel: __$fileinparam<3>
port 4: (cntrl) - ; channel: $
Mar-22 12:07:59.347 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] sourmash_compute_sketch
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (queue) OPEN ; channel: -
port 4: (cntrl) - ; channel: $
[process] sourmash_compare_sketches
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (value) OPEN ; channel: __$fileinparam<3>
port 4: (cntrl) - ; channel: $
Mar-22 12:12:59.355 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] sourmash_compute_sketch
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (queue) OPEN ; channel: -
port 4: (cntrl) - ; channel: $
[process] sourmash_compare_sketches
status=ACTIVE
port 0: (value) OPEN ; channel: __$eachinparam<0>
port 1: (value) OPEN ; channel: __$eachinparam<1>
port 2: (value) OPEN ; channel: __$eachinparam<2>
port 3: (value) OPEN ; channel: __$fileinparam<3>
I'm (still) facing the same issue with version Version: 19.04.1 build 5072.
I started with 160 input files successfully. After increasing that to 100k the submission is stuck.
$ jstack 28201
2019-06-11 21:40:39
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.171-b11 mixed mode):
"Attach Listener" #20 daemon prio=9 os_prio=0 tid=0x00007f4490001000 nid=0x6ede waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Actor Thread 3" #19 daemon prio=5 os_prio=0 tid=0x00007f4450004000 nid=0x6e7b waiting on condition [0x00007f44cf9de000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c14d7d50> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"Task submitter" #18 daemon prio=5 os_prio=0 tid=0x00007f44ed20c000 nid=0x6e72 waiting on condition [0x00007f44cecd8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c16c20b0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at nextflow.processor.TaskPollingMonitor.awaitTasks(TaskPollingMonitor.groovy:369)
at nextflow.processor.TaskPollingMonitor.submitLoop(TaskPollingMonitor.groovy:393)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217)
at groovy.lang.MetaClassImpl.invokeMethodClosure(MetaClassImpl.java:1050)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1092)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:405)
at groovy.lang.Closure.call(Closure.java:399)
at groovy.lang.Closure.run(Closure.java:486)
at java.lang.Thread.run(Thread.java:748)
"Task monitor" #17 prio=5 os_prio=0 tid=0x00007f44ed20b000 nid=0x6e71 waiting on condition [0x00007f44cedd9000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c16c23b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
at nextflow.processor.TaskPollingMonitor$_await_closure5.doCall(TaskPollingMonitor.groovy:495)
at nextflow.processor.TaskPollingMonitor$_await_closure5.call(TaskPollingMonitor.groovy)
at nextflow.extension.Bolts.withLock(Bolts.groovy:333)
at nextflow.extension.Bolts.withLock(Bolts.groovy)
at nextflow.processor.TaskPollingMonitor.await(TaskPollingMonitor.groovy:494)
at nextflow.processor.TaskPollingMonitor.pollLoop(TaskPollingMonitor.groovy:424)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at org.codehaus.groovy.runtime.InvokerHelper.invokePogoMethod(InvokerHelper.java:1011)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethod(InvokerHelper.java:994)
at org.codehaus.groovy.runtime.InvokerHelper.invokeMethodSafe(InvokerHelper.java:97)
at nextflow.processor.TaskPollingMonitor$_start_closure2.doCall(TaskPollingMonitor.groovy:302)
at nextflow.processor.TaskPollingMonitor$_start_closure2.call(TaskPollingMonitor.groovy)
at groovy.lang.Closure.run(Closure.java:486)
at java.lang.Thread.run(Thread.java:748)
"Actor Thread 2" #16 daemon prio=5 os_prio=0 tid=0x00007f44ed1d0800 nid=0x6e70 runnable [0x00007f44ceeda000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
- locked <0x000000075cf07fc8> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
- locked <0x000000075cf088f0> (a sun.security.ssl.AppInputStream)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1285)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1101)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:758)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:732)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:714)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:674)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:656)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:520)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4705)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4652)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4646)
at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:870)
at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:104)
at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:117)
at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:643)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.codehaus.groovy.runtime.NioGroovyMethods.size(NioGroovyMethods.java:94)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:54)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1217)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.runtime.metaclass.NextflowDelegatingMetaClass.invokeMethod(NextflowDelegatingMetaClass.java:74)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:44)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:119)
at _nf_script_6e0fac66$_run_closure1.doCall(_nf_script_6e0fac66:56)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:37)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127)
at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:46)
at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041)
at groovy.lang.Closure.call(Closure.java:405)
at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108)
at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43)
at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293)
at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30)
at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93)
at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"Actor Thread 1" #14 daemon prio=5 os_prio=0 tid=0x00007f44ed1a0800 nid=0x6e6e waiting on condition [0x00007f44cf2dd000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c183c130> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:458)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)
at java.util.concurrent.SynchronousQueue.take(SynchronousQueue.java:924)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
"java-sdk-http-connection-reaper" #11 daemon prio=5 os_prio=0 tid=0x00007f44ece9b800 nid=0x6e6a waiting on condition [0x00007f44d41ef000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at com.amazonaws.http.IdleConnectionReaper.run(IdleConnectionReaper.java:188)
"Service Thread" #9 daemon prio=9 os_prio=0 tid=0x00007f44ec0d6000 nid=0x6e68 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0x00007f44ec0ca800 nid=0x6e67 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007f44ec0c8800 nid=0x6e66 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f44ec0c6800 nid=0x6e65 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f44ec0c3800 nid=0x6e64 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f44ec0c2000 nid=0x6e63 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f44ec08f000 nid=0x6e62 in Object.wait() [0x00007f44d6581000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005c000e060> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
- locked <0x00000005c000e060> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:212)
"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f44ec08a800 nid=0x6e61 in Object.wait() [0x00007f44d6682000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000005c001cb90> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
- locked <0x00000005c001cb90> (a java.lang.ref.Reference$Lock)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"main" #1 prio=5 os_prio=0 tid=0x00007f44ec00c000 nid=0x6e57 waiting on condition [0x00007f44f5eb8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000005c0873260> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at nextflow.util.Barrier$_awaitCompletion_closure3.doCall(Barrier.groovy:92)
at nextflow.util.Barrier$_awaitCompletion_closure3.call(Barrier.groovy)
at nextflow.extension.Bolts.withLock(Bolts.groovy:333)
at nextflow.extension.Bolts.withLock(Bolts.groovy)
at nextflow.util.Barrier.awaitCompletion(Barrier.groovy:89)
at nextflow.Session.await(Session.groovy:608)
at nextflow.script.ScriptRunner.terminate(ScriptRunner.groovy:387)
at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:172)
at nextflow.cli.CmdRun.run(CmdRun.groovy:245)
at nextflow.cli.Launcher.run(Launcher.groovy:453)
at nextflow.cli.Launcher.main(Launcher.groovy:630)
"VM Thread" os_prio=0 tid=0x00007f44ec082800 nid=0x6e60 runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f44ec021000 nid=0x6e58 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f44ec023000 nid=0x6e59 runnable
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f44ec025000 nid=0x6e5a runnable
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f44ec026800 nid=0x6e5b runnable
"GC task thread#4 (ParallelGC)" os_prio=0 tid=0x00007f44ec028800 nid=0x6e5c runnable
"GC task thread#5 (ParallelGC)" os_prio=0 tid=0x00007f44ec02a800 nid=0x6e5d runnable
"GC task thread#6 (ParallelGC)" os_prio=0 tid=0x00007f44ec02c000 nid=0x6e5e runnable
"GC task thread#7 (ParallelGC)" os_prio=0 tid=0x00007f44ec02e000 nid=0x6e5f runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f44ec0da800 nid=0x6e69 waiting on condition
JNI global references: 281
Jun-11 21:45:41.618 [Task monitor] TRACE n.processor.TaskPollingMonitor - Scheduler queue size: 0 (iteration: 66)
Jun-11 21:45:51.619 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] run
status=ACTIVE
port 0: (queue) OPEN ; channel: -
port 1: (cntrl) - ; channel: $
Jun-11 21:45:51.619 [Task monitor] TRACE n.processor.TaskPollingMonitor - Scheduler queue size: 0 (iteration: 67)
Jun-11 21:46:01.620 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] run
status=ACTIVE
port 0: (queue) OPEN ; channel: -
port 1: (cntrl) - ; channel: $
Most helpful comment
Squashed. In a nutshell aws api calls are submitted through a thread poll in order to throttle the submission rate, however a nested api requested inside a synchronised block along with the large number of requested was causing a starvation condition (exhausting all threads in the pool so the response was never returned ...).
At some point I need to review this logic, but the patch should solve the problem. Thanks for reporting the issue, it was a bad one. For the future, when the execution hangs use the
jstacktool reporting the printed output.(will uploaded a patched build in the following days)