It would be nice to have a way to classify processes according to their resources allocation for reducing the size of config file.
For example:
processes 1, 2,3,4,5 can be grouped as huge_demanding_ processes and processes 6,7,8,9,10 as little_processes. In this way you need to specify only two configurations within the config.file
Very good idea!
This could also be extended to containers... imagine that I have a container for processes 1,2,3 and another for 4,5,6 etc...
Yes, the point is to decouple the resource annotations from the process to make it more flexible and less verbose. What about this:
labels directive;add a new label selector in the config file, eg:
process {
withLabel: big {
cpus = 8
memory = 10.GB
queue = 'long-queue'
}
withLabel: small {
cpus = 1
memory = 2.GB
queue = 'short-queue'
}
withName: YourProcessName {
cpus = 4
container = 'image/name'
}
}
the withLabel category allows the assigned of the grouped resources to all process
declaring the specified label;
withName category allows the assignment of the grouped resources/directives to the process having the named specified; withName category overrides the withLabel assignment; process.$name.xxx syntax. This should address would also address the issue #621.
A first implementation of new process configuration via label annotations is ready to tested.
This version allow a process to define one or more labels eg:
process foo {
label 'gpu'
"""
your_command
"""
}
Then in the configuration you can use the the withLabel or withName categories eg:
process {
withLabel: gpu {
queue = 'the-gpu-queue'
memory = 10.GB
}
}
See also the previous comment for other details. You can give a try using the version 0.29.0-SNAPSHOT, eg:
```
NXF_VER=0.29.0-SNAPSHOT nextflow run .. etc
````
@ewels @MaxUlysse @lucacozzuto @msmootsgi You feedback is welcome.
Great, I'll look into it ASAP
Same here :-)
Nice! A simple question: why not to use the same tag: (i.e. label) for both process definitions (in both main and configuration file)?
process foo {
label 'gpu'
"""
your_command
"""
}
process {
label: gpu {
queue = 'the-gpu-queue'
memory = 10.GB
}
}
This will be easier to remember.
That's how it is supposed to work.
Ahh, no sorry, I've misread.
Actually I've tried also that, but at the I've found more readable withLabel and withName.
Well, let's ask also to the other guys :)
Don't really mind.
It should probably be easier with both name being the same tag.
But as one is defining the label and the other one using it, it really make sense not to use the same tag.
I'm for withLabel
Things to do to complete this issue:
withLabel: setting overwrite specific directives defined in the main process script?
process.'withLabel:foo'.xxx = 1
process.'withName:bar'.yyy = 2
Great initiative!
deprecated the current
process.$name.xxxsyntax.
Are we talking about breaking backwards compatibility here? For the record, I also really like the current setup - I think it's very clear. The above would be a great addition, but personally I'd like to continue using the current syntax as well if possible.
No, no breaking existing pipeline, but give a window to allow users to migrate to the new syntax eg
process {
withName: foo { xxx = 1 }
}
The problem with process.$name.xxx syntax is that it's incredible confusing for newcomers, because people tend to confuse the $ to escape the process name with $ to interpolate config variables.
Ok, gotcha 馃憤 Sounds good!
Nice!
Question about the new syntax. Say I've got a label defined like this in nextflow.config:
process {
withName: normal {
cpus 1
queue 'normal'
...
}
}
And then I use the label in a process. Will I be able to override values set in the label in the process itself? For example, override the number of CPUs requested like this:
process somethingHard {
label 'normal'
cpus 4
input:
...
}
Currently it overrides the cpus setting defined in the process (main script). Do you think it should not?
This could just be how I write pipelines, but I put default values (and labels in the future) in nextflow.config and then process specific setting in the process in main.nf. That makes me think label values should be overridden _by the process_.
I find these override hierarchies kind of tricky to get right, so I think it's worth some thought. And I make no claims here - I haven't given this a ton of thought!
I tend to agree, that it could be useful it can be quite tricky to remember all these rules. Maybe just keep it simple, that setting defined in the config just overwrites the ones in the process is the a better choice. The withName category would be give a chance to specify settings in a punctual manner.
Anyhow I will keep that as an open topic.
As long as the rules are clearly stated, I'm OK with this.
I am not sure whether we might also think about more specific settings overwriting general ones. I mean that configuration values defined at process level (e.g. withName) would overwrite settings defined in labels, irrespective of where they have been set (main pipeline script vs nextflow.config). Maybe it is a more linear approach and not so hard to remember.
Also, what about processes with multiple labels that touch common options? I think it would be better to avoid it, but in case it won't be possible what would be the order of precedence?
I am not sure whether we might also think about more specific settings overwriting general ones.
If I have understood well, that's already possible. Take in consideration the following config:
process {
cpus = 2
withLabel: big {
cpus = 8
}
withName: foo {
cpus = 4
}
}
The above snippet defines:
big label (overriding any setting in the main script)fooAlso, what about processes with multiple labels that touch common options?
Currently the last won.
OK, another round adding the support for regulation expression and not operator! This allows to define label/process selectors with regex patterns, eg:
process {
withLabel: 'foo.*' { cpus = 8 }
withName: '!bar' { memory = 32.GB }
}
Please give it a try using the following command
CAPSULE_RESET=1 NXF_VER=0.29.0-SNAPSHOT nextflow info # ONLY THE FIRST TIME
NXF_VER=0.29.0-SNAPSHOT nextflow run ...etc
You can find the documentation at this links:
https://github.com/nextflow-io/nextflow/blob/master/docs/process.rst#label
https://github.com/nextflow-io/nextflow/blob/master/docs/config.rst#process-selectors
Available in version 0.29.0-RC1. Please give a try to existing configuration files.
Note: release 0.29.x will support both .$ and the new notation to allow the migration to the new system.
Release 0.30.x will deprecated the .$ showing a warning message, and it will be removed in a future release.
Most helpful comment
Yes, the point is to decouple the resource annotations from the
processto make it more flexible and less verbose. What about this:labels directive;add a new label selector in the config file, eg:
the
withLabelcategory allows the assigned of the grouped resources to all processdeclaring the specified label;
withNamecategory allows the assignment of the grouped resources/directives to the process having the named specified;withNamecategory overrides thewithLabelassignment;process.$name.xxxsyntax.This should address would also address the issue #621.