Nextflow: ignite error on aws EC2

Created on 11 May 2018  路  3Comments  路  Source: nextflow-io/nextflow

Following @pditommaso screencast I am using nextflow to cloud create a cluster on AWS EC2. So far so good, several worker nodes are instantiated along with the master node. No problem ssh-ing into the master node. However, when I run a test ./nextflow run examples/blast.nf -with-docker I get the following error from ignite:

N E X T F L O W  ~  version 0.29.0
Pulling nextflow-io/examples ...
 downloaded from https://github.com/nextflow-io/examples.git
Launching `nextflow-io/examples` [curious_goldberg] - revision: 27afa1c086 [master]
[warm up] executor > ignite
ERROR ~ org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder.setShared(Z)V

 -- Check script 'blast.nf' at line: 52 or see '.nextflow.log' file for more details

The config used for setting up the cluster:

cloud {
  imageId = 'ami-054c4e0bad8549c37' //a clone of the AMI used in the screencast, to have it available in local aws region
  subnetId = 'subnet-57eba230'  
  sharedStorageId = 'fs-d21be5eb' //EFS volume
  sharedStorageMount = '/mnt/efs'
  instanceType = 't2.micro'
  userName = 'radsuchecki'
}

Log file from the master node: .nextflow.log

Most helpful comment

It looks this is an issue with version 0.29.0 is using a wrong version of Ignite:

$ NXF_VER=0.29.0 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.0/nxf-ignite-0.29.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/2.4.0/ignite-aws-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/2.4.0/ignite-slf4j-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/2.4.0/ignite-core-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

$ NXF_VER=0.29.1 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.1/nxf-ignite-0.29.1.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/1.6.0/ignite-aws-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/1.6.0/ignite-slf4j-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/1.6.0/ignite-core-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

If you update to the latest (0.29.1) it should be fine.

All 3 comments

It looks this is an issue with version 0.29.0 is using a wrong version of Ignite:

$ NXF_VER=0.29.0 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.0/nxf-ignite-0.29.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/2.4.0/ignite-aws-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/2.4.0/ignite-slf4j-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/2.4.0/ignite-core-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

$ NXF_VER=0.29.1 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.1/nxf-ignite-0.29.1.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/1.6.0/ignite-aws-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/1.6.0/ignite-slf4j-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/1.6.0/ignite-core-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

If you update to the latest (0.29.1) it should be fine.

This gets things going, however, not sure if there is something wrong with my setup, but following from the above, it appears the nodes are not getting clustered(?) , e.g. from .node-nextflow.log on one of the worker nodes:

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=aaf276db, name=nextflow, uptime=00:15:00:011]
    ^-- H/N/C [hosts=1, nodes=1, CPUs=2]
    ^-- CPU [cur=0%, avg=0.22%, GC=0%]
    ^-- Heap [used=131MB, free=92.58%, comm=297MB]
    ^-- Non heap [used=55MB, free=-1%, comm=56MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
May-14 07:10:01.146 [scheduler-agent] DEBUG nextflow.scheduler.SchedulerAgent - === Waiting for master node to join..

even though

 [main] DEBUG nextflow.daemon.IgGridFactory - Apache Ignite config > joining IPs: 172.31.14.62, 172.31.3.109, 172.31.5.179, 172.31.0.253

That's not good. Open all ports for connections in the default security context.

Was this page helpful?
0 / 5 - 0 ratings