Nextflow: GNU command line interface

Created on 19 Dec 2017  路  22Comments  路  Source: nextflow-io/nextflow

Nextflow currently uses single - prefix for command level options and -- prefix for the definition of user workflow parameters.

Users frequently do not realise the difference and use the -- prefix for nextflow options, which instead are interpreted workflow parameters and therefor ignored, causing confusion.

For example people tend to use --resume instead of -resume causing the pipeline to re-start from scratch.

The goal of this issue is to re-organise the nextflow command line interface using the GNU convention i.e. using a single - for short options (one char) and -- for long options (2 or more chars) since this is the most common convention.

To distinguish nextflow options from the user workflow parameters, the run command line needs to be interpreted in a positional manner ie. all options up from the run command to the project/script to be executed are interpreted as run options. Following the project/script name are interpreted as the user workflow parameters. For example:

nextflow run --revision v1.1 --resume <project name> --foo x --bar y 

Whenever possible the new CLI should be backward compatible with the current CLI, automatically replacing any single - prefix to a -- prefix and showing a warning message to the user.

A library that's a good candidate on which base the new CLI is Picocli.

kinenhancement prlow stale

Most helpful comment

sadness consumes me 馃槥

All 22 comments

Dear Paolo

Not a volunteer, but a suggestion of a coding style approach to dealing with this problem until this is done (which I think would be a good idea).

It's fairly easy to integrate into a pipeline -- just have a list of allowed parameters which get checked. Even for a complex pipeline it shouldn't take more than 5 minutes to do. A similar approach can be used for providing command line help.

allowed_params = ["start", "cutoff", "theta","foo"]

params.each { entry ->
  if (! allowed_params.contains(entry.key)) {
      println("The parameter <${entry}.key> is not known");
      System.exit(2);
  }
}    


if (params.help) {
    params.each {
    entry ->
      print "Parameter: <$entry.key>    \t Default: $entry.value"
      help = params_help.get(entry.key)
      if (help) {
          print "\n    $help"
          println ""
      }
  }
  System.exit(0)
}

FWIW, I only ever use nextflow's native command line arguments on the actual command line. All other parameters get specified with -params-file and a YAML file. This has two advantages: it's easy to track what you did and it's easy to standardize the execution of arbitrary pipelines in our cloud infrastructure. The obvious disadvantage is that it takes effort to create the YAML file.

The only time I use command line params is when I'm working with dummy code and testing ideas.

Thanks for you feedback, however it's quite handy to have the ability to pass parameters as CLI options. It should be preserved.

See also #129

129 can be solved using positional parameters

_2.4. Mixing Options and Positional Parameters_ See doc

The new command line is mostly implemented, however a decision needs to be taken how handle implicit option and user parameters. There are mainly two choices:

1) Use a strict positional command line option, think like docker run [options] <image> [params]. In the case of nextflow would be nextflow run [option] <pipeline> [pipeline parameters] eg:

```
nextflow run --revision foo --work-dir /usr/workspace rnaseq-nf --reads /some/path

```

The problem of this approach is that does not allow command line option implicit values, for example it won't be possibile to just say `--resume`, but it would be mandatory to provide a value for the resume option (because otherwise it could consume the pipeline name as a resume value).
  1. another option is to separate NF command line option from pipeline parameters using a -- separator. This would allow the use of a not positional command line (like it's now) and therefore the ability of using implicit option values. For example:

    nextflow run rnaseq-nf --revision foo --resume -- --reads /somedata 
    

Even this it's a bit more verbose I'm thinking that maybe a better solution, also because impose a clearly separation from NF options and script option, helping to avoid problems like #624.

@emi80 @msmootsgi @ewels What do you think?

I don't mean to further confuse matters, but have you considered user parameters similar to how java handles extra parameters like -Xmx10G? So nextflow would define all it's normal command line params and then any user defined params would be preceded with -P, so for params.name within your script you'd have -Pname=whatever on the command line? That should make it very clear which params are which.

I think second option will be better. More readable and more powerful in the future

Can you not just have a list of protected nextflow vocabulary? eg. Exit with an error if anyone tries to use params.resume? Then try to parse the command line options with these first and anything that鈥檚 left must be a params option. Then you can be super flexible with how they鈥檙e specified..

@ewels I like the idea, but it limits adding options/parameters to core nextflow in the future since any new nextflow options might stomp on something I added to my pipeline.

Yup, true. But how many new command line options have been added recently? If nextflow throws an error when params.newcommand is mentioned anywhere then it鈥檒l be easy to fix and hopefully not more than an inconvenience.

I just worry that this issue is to reduce confusion by users, but the two proposals will still be confusing but just in different ways.

I think I would lean towards a variant of the first option. In a way it's there already. With the cloud command, the "-c" can occur before or after the "cloud" with different meanings.

nextflow -c scott.aws -c run10.config cloud create scottcluster -c 5

Thank you all for the feedback. Let me clarify that this change won't break, at least initially, the existing command line. Any option longer more than a character and prefixed with - will be automatically converted to the equivalent one prefixed with -- automatically, warning the user of the new notation.

Regarding the proposal to use a special prefix for script parameters eg -Pfoo=1 it could be a good compromise, tho IMO it makes the much less readable. For this reason I prefer one of the two proposed solution above.

The idea of using a reserved vocabulary for NF options and others for user scripts I would avoid the reason explained by Mike. It's true that new options are not added frequently, but just opening the door to the possibility to break a user pipeline for adding a new option it's not a good idea and it would be badly perceived by users. -1

Then what @shaze is mentioning is exactly my original idea. The problem is that I've realised that this approach limits the ability of have command line option with a default/implicit value eg. consider in your example you want to that the value for the -c option not mandatory ie. you can write either -c value or just -c which will take a default value. The problem is that command line parser won't be able to recognise the absence of value e.g.

nextflow -c scott.aws -c cloud create scottcluster -c 5

In the above example the word cloud will be used as value of the -c option, therefore using this approach command line option values are always required e.g.:

nextflow -c scott.aws -c - cloud create scottcluster -c 5

I normally strongly advocate the common -x --long-x-option approach and cringe a bit when encountering -long. However in the case of NF the well-defined use of -long vs --long clearly delineates NF opts and pipeline params. It also allows the user not to think about correctly positioning opts/params. Short opts are great for bashing away in an interactive session, trying things out - this is not what NF is about, right?

As for the example with erroneous use of --resume vs -resume, wouldn't that be better addressed by stopping the execution if any undefined pipeline opt is encountered?

Finally, the two approaches proposed by @pditommaso are reasonable, with the second one making it easier for the user to focus on either NF or pipeline params as needed.

As for the example with erroneous use of --resume vs -resume, wouldn't that be better addressed by stopping the execution if any undefined pipeline opt is encountered?

Agreed - in fact, this could probably resolve many of the problems and questions that we see. Especially if that error message checked for similar core parameters and gave a hint.

I think the first idea was to throw a WARN. Telling about the new "policy" of --long-option
But IDK how is in this days.

As for the example with erroneous use of --resume vs -resume, wouldn't that be better addressed by stopping the execution if any undefined pipeline opt is encountered?

This should double, I will give it a try.

FWIW, I think the second option in https://github.com/nextflow-io/nextflow/issues/556#issuecomment-371577103 (using -- to separate nextflow from user params arguments) is simple and avoids lots of headaches. Also, this convention will be known to many programmers that also use the popular tool git, as that tool follows that convention as well.

Github says I just unassigned @edgano. I'm not sure how I did that...

this convention will be known to many programmers

FWIW I don't think I was aware of this convention before this thread. Also, many of the people who launch our pipelines aren't programmers; they're biologists who have recently learned to use the command line. I'm a little anxious that such a convention could cause more headaches rather than fewer. I still think that having very simplistic flag handling with a protected vocab is the most user-friendly (though I agree, perhaps not the most developer-friendly).

sadness consumes me 馃槥

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings