Nextflow: ignite error on aws EC2

Created on 11 May 2018 · 3Comments · Source: nextflow-io/nextflow

Following @pditommaso screencast I am using nextflow to cloud create a cluster on AWS EC2. So far so good, several worker nodes are instantiated along with the master node. No problem ssh-ing into the master node. However, when I run a test ./nextflow run examples/blast.nf -with-docker I get the following error from ignite:

N E X T F L O W  ~  version 0.29.0
Pulling nextflow-io/examples ...
 downloaded from https://github.com/nextflow-io/examples.git
Launching `nextflow-io/examples` [curious_goldberg] - revision: 27afa1c086 [master]
[warm up] executor > ignite
ERROR ~ org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder.setShared(Z)V

 -- Check script 'blast.nf' at line: 52 or see '.nextflow.log' file for more details

The config used for setting up the cluster:

cloud {
  imageId = 'ami-054c4e0bad8549c37' //a clone of the AMI used in the screencast, to have it available in local aws region
  subnetId = 'subnet-57eba230'  
  sharedStorageId = 'fs-d21be5eb' //EFS volume
  sharedStorageMount = '/mnt/efs'
  instanceType = 't2.micro'
  userName = 'radsuchecki'
}

Log file from the master node: .nextflow.log

Source

rsuchecki

Most helpful comment

It looks this is an issue with version 0.29.0 is using a wrong version of Ignite:

$ NXF_VER=0.29.0 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.0/nxf-ignite-0.29.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/2.4.0/ignite-aws-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/2.4.0/ignite-slf4j-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/2.4.0/ignite-core-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

$ NXF_VER=0.29.1 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.1/nxf-ignite-0.29.1.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/1.6.0/ignite-aws-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/1.6.0/ignite-slf4j-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/1.6.0/ignite-core-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

If you update to the latest (0.29.1) it should be fine.

pditommaso on 11 May 2018

🎉1 👍1

All 3 comments

It looks this is an issue with version 0.29.0 is using a wrong version of Ignite:

$ NXF_VER=0.29.0 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.0/nxf-ignite-0.29.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/2.4.0/ignite-aws-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/2.4.0/ignite-slf4j-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/2.4.0/ignite-core-2.4.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

$ NXF_VER=0.29.1 NXF_MODE=ignite nextflow  info -d  | grep ignite
    NXF_MODE=ignite
    /Users/pditommaso/.nextflow/capsule/deps/io/nextflow/nxf-ignite/0.29.1/nxf-ignite-0.29.1.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-aws/1.6.0/ignite-aws-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-slf4j/1.6.0/ignite-slf4j-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/apache/ignite/ignite-core/1.6.0/ignite-core-1.6.0.jar
    /Users/pditommaso/.nextflow/capsule/deps/org/gridgain/ignite-shmem/1.0.0/ignite-shmem-1.0.0.jar

If you update to the latest (0.29.1) it should be fine.

pditommaso on 11 May 2018

🎉1 👍1

This gets things going, however, not sure if there is something wrong with my setup, but following from the above, it appears the nodes are not getting clustered(?) , e.g. from .node-nextflow.log on one of the worker nodes:

Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=aaf276db, name=nextflow, uptime=00:15:00:011]
    ^-- H/N/C [hosts=1, nodes=1, CPUs=2]
    ^-- CPU [cur=0%, avg=0.22%, GC=0%]
    ^-- Heap [used=131MB, free=92.58%, comm=297MB]
    ^-- Non heap [used=55MB, free=-1%, comm=56MB]
    ^-- Public thread pool [active=0, idle=16, qSize=0]
    ^-- System thread pool [active=0, idle=16, qSize=0]
    ^-- Outbound messages queue [size=0]
May-14 07:10:01.146 [scheduler-agent] DEBUG nextflow.scheduler.SchedulerAgent - === Waiting for master node to join..

even though

 [main] DEBUG nextflow.daemon.IgGridFactory - Apache Ignite config > joining IPs: 172.31.14.62, 172.31.3.109, 172.31.5.179, 172.31.0.253

rsuchecki on 14 May 2018

That's not good. Open all ports for connections in the default security context.

pditommaso on 14 May 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Channel join loses duplicate keys

stevekm · 5Comments

Allow the `storeDir` directive to handle file names with "non-standard" characters

tinyheero · 6Comments

Add support for directory wildcards in input file declarations

ewels · 3Comments

Allow access to manifest scope during workflow execution

ewels · 4Comments

JSON output for `nextflow info`

ewels · 6Comments