Nextflow: Importing Parameters in Nextflow using json / yaml file

Created on 29 Jul 2016  路  25Comments  路  Source: nextflow-io/nextflow

Until now, the parameters in most of the nextflow examples have been through command line arguments which is a good solution, if you manually run the nextflow pipelines. In order to dynamically feed in parameters to the nextflow - it will be helpfully if an upstream can send in the json/yaml file as an argument file that could be directly utilized to build the parameters list on the pipeline.

In the process of building a sequencing workflow with nextflow - I found it there is no way to feed the nextflow process with parameters from json/yaml file - as discussed on the Gitter; here is the solution @pditommaso suggested with -

parameter_json = file(params.in)
new groovy.json.JsonSlurper().parseText(parameter_json.text).each { k, v -> params[k] = v }

println params.alpha
println params.beta 

The above pipeline is run on the command line as -

nextflow main.nf --in sample.json

The issues is starting a discussion of how to implement json/yaml parameter file and have a standard argument on the command line to be used for such a parameter file.

Most helpful comment

I've uploaded a new snapshot implementing this feature both for JSON and YAML files. You may want to give a try defining the var NXF_VER=0.23.4-SNAPSHOT. Feedback is welcome.

All 25 comments

I agree on your proposal to add a command line option that allows parameters to be loaded from an external file. Groovy has an excellent support for Json, instead Yaml would require a separated library. For this reason I would propose to start with the first.

It should be relatively easy to add a -params-file command line option that read the json file and populate the parameters map in a similar manner showed in the above snippet.

The class involved is CmdRun to which it should be added the option to specify the json file and the code to parse it.

My guess is that the method getParsedParams should to that by checking if the -params-file has been specified.

Caveats:

  1. A user may specify parameters by using the json file _and_ as command line options. The ones on the command line have priority i.e. overrides the ones in the json file.
  2. The params in the json file have priority on those defined in the nextflow.config file
  3. Though json supports primitive types, string values need to be parsed to primitive ones for coherence with the current mechanism. See getParsedParams.
  4. When -params-file - is specified, the json structure should be read from the stdin

Suggested actions:

  • Fork the nextflow repo
  • Create a branch on your fork to implement this feature
  • Add some tests in the CmdRunTest class
  • Commit _without_ using any issue handle in the commit message i.e. (#208)
  • Create a pull request for that.

@jgrzebyta You be interested to this.

We do something similar with yaml in our pipelines. In addition to the above it would be nice to have an option that allows us to get the actual object parsed rather than key/value pairs. Our config file has all sorts of data types, which feed different input channels. This option would give the pipeline control over how the object gets destructured.

@pditommaso I would be interested indeed and also might add my contribution to the code but not earlier than the end of Aug.

I suggest another caveat:
2a. parameters given in the command line cover those defined in -params-file file

@jgrzebyta any progress on this - if you have fork you are working on, I can help; if not, I can start working on it.

For anyone, like me, who comes looking for a way to import YAML parameters, with SnakeYAML you can use:

import org.yaml.snakeyaml.Yaml

parameter_yaml = new FileInputStream(new File(params.in))
new Yaml().load(parameter_yaml).each { k, v -> params[k] = v }

Hope this helps.

I've uploaded a new snapshot implementing this feature both for JSON and YAML files. You may want to give a try defining the var NXF_VER=0.23.4-SNAPSHOT. Feedback is welcome.

YAML-processing works well for me. Thanks!

I think the paramsFile parameter should persist, even after the file has been parsed. This allows the script to pass the path to the parameters file along to processes, if needed. At the moment, it seems to be removed, even when an overriding value isn't present in the YAML file.

What do you mean? where it should persist ?

params.paramsFile

Because it's not a script parameter but a run command line option. It not supposed to be in the params map

Is that a problem ?

Is there anyway to access it afterwards? For example, if you want to pass the path to the parameters file to a Process rather than re-rewriting a new parameters file?

It might, however, be better practise to write a new parameters file with just those that are needed for each task. But there probably should the the option to do either.

Because you would like to use the same config file in the command executed by one or more processes. Not sure that's a good practice. In principle process commands should not have other deps than the declared inputs. Breaking this contract may introduce limitation when executing your script with containers or in an environment that doesn't provide a share-all file system (think the cloud).

Included in version 0.24.0

Is this new -params-file parameter documented anywhere? I can see it in the source code, but if I use it while executing nextflow it seems to ignore it.

Unfortunately command line options are not yet documented. However you can create a yaml/json file like the following:

{"foo":1, "bar":2}

then if you specify this file with the -params-file option you will be able to access those values in your script as any other parameter eg:

println params.foo
println params.bar

Thank you, it seems to be working. What I didn't realise is that arrays don't work as a parameter type in the params-file. The parameters are silently not set, in that case, I believe.

can you provide an example and eventually open a separate issue ?

Could you provide an example of a file that the -params-file parameter works with? Thanks!

?

There doesn't seem to be any documentation available for how to access elements of a yaml file outside of the simplistic example in your comment above.

Hi mukundvarma

JSON is parsed with JsonSluper and YAML is parsed with SnakeYaml - you may find useful documentation on their pages.

Here's a YAML example that may be helpful.

In params.yaml:

testBoolean: True
testString: "some text"
testInteger: 42
testList:
- "item1"
- "item2"
- "item3"
objectList:
  - name: "Benny"
    age: 53
  - name: "Betty"
    age: 53

In testScript.nf:

#!/usr/bin/env nextflow

println("testBoolean: " + params.testBoolean)
println("testString: " + params.testString)
println("testInteger: " + params.testInteger)

itemChannel = Channel.from(params.testList)
itemChannel.println()

objectChannel = Channel.from(params.objectList)

process objectProcess {
    input:
    val person from objectChannel

    output:
    stdout personDetailsChannel

    shell:
    '''
    echo "This year, !{person.name} is !{person.age} years old"
    '''
}

personDetailsChannel.println()

Then run with:

chmod +x testScript.nf
./testScript.nf -params-file params.yaml

Hope this helps.

Is this only for loading pipeline params? Is there a way to use JSON as an input Channel?

Was this page helpful?
0 / 5 - 0 ratings