Nextflow: Allow the definition of process resources using a custom category annotation

Created on 26 Feb 2018  路  25Comments  路  Source: nextflow-io/nextflow

It would be nice to have a way to classify processes according to their resources allocation for reducing the size of config file.
For example:
processes 1, 2,3,4,5 can be grouped as huge_demanding_ processes and processes 6,7,8,9,10 as little_processes. In this way you need to specify only two configurations within the config.file

Most helpful comment

Yes, the point is to decouple the resource annotations from the process to make it more flexible and less verbose. What about this:

  • add the ability to annotate a process with one or more labels directive;
  • add a new label selector in the config file, eg:

    process {
        withLabel: big {
            cpus = 8 
            memory = 10.GB
            queue = 'long-queue'
        }
        withLabel: small {
            cpus = 1 
            memory = 2.GB    
            queue = 'short-queue'
        }
        withName: YourProcessName {
            cpus = 4 
            container = 'image/name'
        }
    }
    
  • the withLabel category allows the assigned of the grouped resources to all process
    declaring the specified label;

  • the withName category allows the assignment of the grouped resources/directives to the process having the named specified;
  • the withName category overrides the withLabel assignment;
  • ~each process defines an implicit label as the process name;~
  • deprecated the current process.$name.xxx syntax.

This should address would also address the issue #621.

All 25 comments

Very good idea!

This could also be extended to containers... imagine that I have a container for processes 1,2,3 and another for 4,5,6 etc...

Yes, the point is to decouple the resource annotations from the process to make it more flexible and less verbose. What about this:

  • add the ability to annotate a process with one or more labels directive;
  • add a new label selector in the config file, eg:

    process {
        withLabel: big {
            cpus = 8 
            memory = 10.GB
            queue = 'long-queue'
        }
        withLabel: small {
            cpus = 1 
            memory = 2.GB    
            queue = 'short-queue'
        }
        withName: YourProcessName {
            cpus = 4 
            container = 'image/name'
        }
    }
    
  • the withLabel category allows the assigned of the grouped resources to all process
    declaring the specified label;

  • the withName category allows the assignment of the grouped resources/directives to the process having the named specified;
  • the withName category overrides the withLabel assignment;
  • ~each process defines an implicit label as the process name;~
  • deprecated the current process.$name.xxx syntax.

This should address would also address the issue #621.

A first implementation of new process configuration via label annotations is ready to tested.

This version allow a process to define one or more labels eg:

process foo {
  label 'gpu'

  """
  your_command
  """
}

Then in the configuration you can use the the withLabel or withName categories eg:

process {
  withLabel: gpu {
     queue = 'the-gpu-queue'
     memory = 10.GB
  }
}

See also the previous comment for other details. You can give a try using the version 0.29.0-SNAPSHOT, eg:

```
NXF_VER=0.29.0-SNAPSHOT nextflow run .. etc
````

@ewels @MaxUlysse @lucacozzuto @msmootsgi You feedback is welcome.

Great, I'll look into it ASAP

Same here :-)

Nice! A simple question: why not to use the same tag: (i.e. label) for both process definitions (in both main and configuration file)?

process foo {
label 'gpu'

"""
your_command
"""
}

process {
label: gpu {
queue = 'the-gpu-queue'
memory = 10.GB
}
}

This will be easier to remember.

That's how it is supposed to work.

Ahh, no sorry, I've misread.

Actually I've tried also that, but at the I've found more readable withLabel and withName.

Well, let's ask also to the other guys :)

Don't really mind.
It should probably be easier with both name being the same tag.
But as one is defining the label and the other one using it, it really make sense not to use the same tag.
I'm for withLabel

Things to do to complete this issue:

  • [ ] apply declared tag to executed K8s pod/jobs
  • [ ] include the process labels in the workflow execution report and trace file
  • [ ] should withLabel: setting overwrite specific directives defined in the main process script?
  • [ ] when printing configuration in flat format, category should be wrapped by quotes eg.
    process.'withLabel:foo'.xxx = 1 process.'withName:bar'.yyy = 2
  • [ ] add integration tests

Great initiative!

deprecated the current process.$name.xxx syntax.

Are we talking about breaking backwards compatibility here? For the record, I also really like the current setup - I think it's very clear. The above would be a great addition, but personally I'd like to continue using the current syntax as well if possible.

No, no breaking existing pipeline, but give a window to allow users to migrate to the new syntax eg

process {
  withName: foo { xxx = 1 }
}

The problem with process.$name.xxx syntax is that it's incredible confusing for newcomers, because people tend to confuse the $ to escape the process name with $ to interpolate config variables.

Ok, gotcha 馃憤 Sounds good!

Nice!

Question about the new syntax. Say I've got a label defined like this in nextflow.config:

process {
  withName: normal {
     cpus 1
     queue 'normal'
     ...
  }
}

And then I use the label in a process. Will I be able to override values set in the label in the process itself? For example, override the number of CPUs requested like this:

process somethingHard {
    label 'normal'
    cpus 4
    input:
    ...
}

Currently it overrides the cpus setting defined in the process (main script). Do you think it should not?

This could just be how I write pipelines, but I put default values (and labels in the future) in nextflow.config and then process specific setting in the process in main.nf. That makes me think label values should be overridden _by the process_.

I find these override hierarchies kind of tricky to get right, so I think it's worth some thought. And I make no claims here - I haven't given this a ton of thought!

I tend to agree, that it could be useful it can be quite tricky to remember all these rules. Maybe just keep it simple, that setting defined in the config just overwrites the ones in the process is the a better choice. The withName category would be give a chance to specify settings in a punctual manner.

Anyhow I will keep that as an open topic.

As long as the rules are clearly stated, I'm OK with this.

I am not sure whether we might also think about more specific settings overwriting general ones. I mean that configuration values defined at process level (e.g. withName) would overwrite settings defined in labels, irrespective of where they have been set (main pipeline script vs nextflow.config). Maybe it is a more linear approach and not so hard to remember.

Also, what about processes with multiple labels that touch common options? I think it would be better to avoid it, but in case it won't be possible what would be the order of precedence?

I am not sure whether we might also think about more specific settings overwriting general ones.

If I have understood well, that's already possible. Take in consideration the following config:

process {
  cpus = 2 
  withLabel: big {
    cpus = 8 
  }
  withName: foo {
    cpus = 4 
  }
}

The above snippet defines:

  • 2 as the default number of cpus for all process (unless something else is declared in the main script).
  • 8 cpus for all processes annotated with the big label (overriding any setting in the main script)
  • 4 cpus for the process named foo

Also, what about processes with multiple labels that touch common options?

Currently the last won.

OK, another round adding the support for regulation expression and not operator! This allows to define label/process selectors with regex patterns, eg:

process {
  withLabel: 'foo.*' { cpus = 8 }
  withName: '!bar' { memory = 32.GB }
}

Please give it a try using the following command

CAPSULE_RESET=1 NXF_VER=0.29.0-SNAPSHOT nextflow info # ONLY THE FIRST TIME
NXF_VER=0.29.0-SNAPSHOT nextflow run ...etc

You can find the documentation at this links:
https://github.com/nextflow-io/nextflow/blob/master/docs/process.rst#label
https://github.com/nextflow-io/nextflow/blob/master/docs/config.rst#process-selectors

Available in version 0.29.0-RC1. Please give a try to existing configuration files.

Note: release 0.29.x will support both .$ and the new notation to allow the migration to the new system.
Release 0.30.x will deprecated the .$ showing a warning message, and it will be removed in a future release.

Was this page helpful?
0 / 5 - 0 ratings