AWS Batch is available to virtually everyone, and it is not only more convenient than local execution in a dedicated EC2 instance - it's also way more cost efficient. Of all the way to run Nextflow non-locally, one can argue/easily guess that AWS batch will become the dominant option.
The documentation on using AWS batch with Nextflow is minimal and blogs like this suggest that most users have indeed trouble figuring out the config steps on the AWS end of things. Granted, the Nextflow-end configuration is super-simple, but for wide adoption users must be able to set up both Nextflow and AWS.
There are some details about a cloud formation template here:
https://docs.opendata.aws/genomics-workflows/orchestration/nextflow/nextflow-overview/
This is awesome. Is that referenced from Nextflow's docs and I just missed it? Thanks for pointing this resource to me!
Not that I know of, but it maybe. I saw it on Twitter and via a contact in AWS.
Sweet! So the "resources" stack worked, but the "All-in-one" stack did not and was rolled back. Is there any Nextflow dev curating this resource?
I think you can report it in this GitHub repo
Ok done it. In general, are there plans to further streamline this specific backend support? This is one of the areas where getting feature-parity with WDL would help tremendously (I heard AWS consultants specifically recommending Cromwell since they mainly look at the execution side of things, not necessarily the flexibility of the language). A quality AWS genomics workflow in the referenced Github repo is a great and necessary first step - just need some further polishing. The setup steps and the general architecture seems a little more complicated than Cromwell's though. Also, Cromwell's architecture with a server running and managing requests via REST seems elegant and enabling.
This is one of the areas where getting feature-parity with WDL would help tremendously
In what extent Cromwell is better compared NF for the Batch execution ?
A quality AWS genomics workflow in the referenced Github repo is a great and necessary first step - just need some further polishing
Not sure to understand what repo you are referring?
1) Documentation. The repo I was referring to is the one you linked to above, here. I think this can be improved by:
At the end of the day, most suggestions here relate to a more turn-key setup (I think there is ample room to streamline it) and clearer docs. It makes more difference than it may look like for busy users who want to spend their time for bioinformatics and not for DevOps :-)
I completely agree, tho I have no control over that docs. It would make sense to leave the same comment in that repo.
Ok I 1) left a note asking whether they are open to feedback; 2) what the membership rules are for that repo (I can probably help with some docs, but would be great if the developers of the tool being documented also get access ;-) to correct inaccuracies)
Thanks