Nextflow: Customized AWS Batch Tagging

Created on 27 Nov 2018  路  8Comments  路  Source: nextflow-io/nextflow

Customized AWS Batch Tagging

When using the awsbatch executor it would be nice to have an automated tagging of created instances. Automated tagging would enable for better cost exploration and resource observation.

Usage scenario

Using awsbatch as the executor the project id and run number are useful to explore costs for that run. Currently it is only possible to set tags per compute environment. A compute environment shared between parallel runs and/or queues is not providing useful tagging at the same time. Providing every instance with the custom tags would therefore be very useful.

Suggest implementation

The only way I can think of would be to use the awscli every time an instance is spun up to tag the current instance. The tags could be submitted similar to the profiles. Key-value pair would have to be grouped together somehow.

nextflow run main.nf -profile awsbatch,custom -tags project.XYZ,samples.42
platforaws-batch platforaws-ec2

Most helpful comment

Yes, please! I suggest tagging the instance with ALL of the initialization-time available workflow metadata. Also AWS user would be helpful :)

Name | Description
-- | --
scriptId | Project main script unique hash ID.
scriptName | Project main script file name.
scriptFile | Project main script file path.
repository | Project repository Git remote URL.
commitId | Git commit ID of the executed workflow repository.
revision | Git branch/tag of the executed workflow repository.
projectDir | Directory where the workflow project is stored in the computer.
launchDir | Directory where the workflow execution has been launched.
workDir | Workflow working directory.
homeDir | User system home directory.
userName | User system account name.
configFiles | Configuration files used for the workflow execution.
container | Docker image used to run workflow tasks. When more than one image is used it returns a map object containing聽[process name, image name]聽pair entries.
containerEngine | Returns the name of the container engine (e.g. docker or singularity) or null if no container engine is enabled.
commandLine | Command line as entered by the user to launch the workflow execution.
profile | Used configuration profile.
runName | Mnemonic name assigned to this execution instance.
sessionId | Unique identifier (UUID) associated to current execution.
resume | Returns聽true聽whenever the current instance is resumed from a previous execution.
start | Timestamp of workflow at execution start.
manifest | Entries of the workflow manifest.

All 8 comments

Yes, please! I suggest tagging the instance with ALL of the initialization-time available workflow metadata. Also AWS user would be helpful :)

Name | Description
-- | --
scriptId | Project main script unique hash ID.
scriptName | Project main script file name.
scriptFile | Project main script file path.
repository | Project repository Git remote URL.
commitId | Git commit ID of the executed workflow repository.
revision | Git branch/tag of the executed workflow repository.
projectDir | Directory where the workflow project is stored in the computer.
launchDir | Directory where the workflow execution has been launched.
workDir | Workflow working directory.
homeDir | User system home directory.
userName | User system account name.
configFiles | Configuration files used for the workflow execution.
container | Docker image used to run workflow tasks. When more than one image is used it returns a map object containing聽[process name, image name]聽pair entries.
containerEngine | Returns the name of the container engine (e.g. docker or singularity) or null if no container engine is enabled.
commandLine | Command line as entered by the user to launch the workflow execution.
profile | Used configuration profile.
runName | Mnemonic name assigned to this execution instance.
sessionId | Unique identifier (UUID) associated to current execution.
resume | Returns聽true聽whenever the current instance is resumed from a previous execution.
start | Timestamp of workflow at execution start.
manifest | Entries of the workflow manifest.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Any updates on plans for this or ideas for workarounds? Would be critical functionality for monitoring/cost tracking.

I would like to also push this issue. We run a core facility that is using nextflow with AWS Batch and love the functionality but would be really nice to tag so we can track billing etc.

I would also like to see this happen

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This is something we would very much like as well. We use the same infrastructure to run pipelines for multiple clients, and would like to be able to assign correct actual running costs to each client using the cost explorer. Which necessitates tags on all the resources created for a job.

Related #1764

Was this page helpful?
0 / 5 - 0 ratings

Related issues

apeltzer picture apeltzer  路  6Comments

ewels picture ewels  路  6Comments

MaxUlysse picture MaxUlysse  路  3Comments

ewels picture ewels  路  4Comments

lindenb picture lindenb  路  3Comments