Until now, the parameters in most of the nextflow examples have been through command line arguments which is a good solution, if you manually run the nextflow pipelines. In order to dynamically feed in parameters to the nextflow - it will be helpfully if an upstream can send in the json/yaml file as an argument file that could be directly utilized to build the parameters list on the pipeline.
In the process of building a sequencing workflow with nextflow - I found it there is no way to feed the nextflow process with parameters from json/yaml file - as discussed on the Gitter; here is the solution @pditommaso suggested with -
parameter_json = file(params.in)
new groovy.json.JsonSlurper().parseText(parameter_json.text).each { k, v -> params[k] = v }
println params.alpha
println params.beta
The above pipeline is run on the command line as -
nextflow main.nf --in sample.json
The issues is starting a discussion of how to implement json/yaml parameter file and have a standard argument on the command line to be used for such a parameter file.
I agree on your proposal to add a command line option that allows parameters to be loaded from an external file. Groovy has an excellent support for Json, instead Yaml would require a separated library. For this reason I would propose to start with the first.
It should be relatively easy to add a -params-file command line option that read the json file and populate the parameters map in a similar manner showed in the above snippet.
The class involved is CmdRun to which it should be added the option to specify the json file and the code to parse it.
My guess is that the method getParsedParams should to that by checking if the -params-file has been specified.
Caveats:
nextflow.config file-params-file - is specified, the json structure should be read from the stdinSuggested actions:
#208)@jgrzebyta You be interested to this.
We do something similar with yaml in our pipelines. In addition to the above it would be nice to have an option that allows us to get the actual object parsed rather than key/value pairs. Our config file has all sorts of data types, which feed different input channels. This option would give the pipeline control over how the object gets destructured.
@pditommaso I would be interested indeed and also might add my contribution to the code but not earlier than the end of Aug.
I suggest another caveat:
2a. parameters given in the command line cover those defined in -params-file file
@jgrzebyta any progress on this - if you have fork you are working on, I can help; if not, I can start working on it.
For anyone, like me, who comes looking for a way to import YAML parameters, with SnakeYAML you can use:
import org.yaml.snakeyaml.Yaml
parameter_yaml = new FileInputStream(new File(params.in))
new Yaml().load(parameter_yaml).each { k, v -> params[k] = v }
Hope this helps.
I've uploaded a new snapshot implementing this feature both for JSON and YAML files. You may want to give a try defining the var NXF_VER=0.23.4-SNAPSHOT. Feedback is welcome.
YAML-processing works well for me. Thanks!
I think the paramsFile parameter should persist, even after the file has been parsed. This allows the script to pass the path to the parameters file along to processes, if needed. At the moment, it seems to be removed, even when an overriding value isn't present in the YAML file.
What do you mean? where it should persist ?
params.paramsFile
Because it's not a script parameter but a run command line option. It not supposed to be in the params map
Is that a problem ?
Is there anyway to access it afterwards? For example, if you want to pass the path to the parameters file to a Process rather than re-rewriting a new parameters file?
It might, however, be better practise to write a new parameters file with just those that are needed for each task. But there probably should the the option to do either.
Because you would like to use the same config file in the command executed by one or more processes. Not sure that's a good practice. In principle process commands should not have other deps than the declared inputs. Breaking this contract may introduce limitation when executing your script with containers or in an environment that doesn't provide a share-all file system (think the cloud).
Included in version 0.24.0
Is this new -params-file parameter documented anywhere? I can see it in the source code, but if I use it while executing nextflow it seems to ignore it.
Unfortunately command line options are not yet documented. However you can create a yaml/json file like the following:
{"foo":1, "bar":2}
then if you specify this file with the -params-file option you will be able to access those values in your script as any other parameter eg:
println params.foo
println params.bar
Thank you, it seems to be working. What I didn't realise is that arrays don't work as a parameter type in the params-file. The parameters are silently not set, in that case, I believe.
can you provide an example and eventually open a separate issue ?
Could you provide an example of a file that the -params-file parameter works with? Thanks!
?
There doesn't seem to be any documentation available for how to access elements of a yaml file outside of the simplistic example in your comment above.
Hi mukundvarma
JSON is parsed with JsonSluper and YAML is parsed with SnakeYaml - you may find useful documentation on their pages.
Here's a YAML example that may be helpful.
In params.yaml:
testBoolean: True
testString: "some text"
testInteger: 42
testList:
- "item1"
- "item2"
- "item3"
objectList:
- name: "Benny"
age: 53
- name: "Betty"
age: 53
In testScript.nf:
#!/usr/bin/env nextflow
println("testBoolean: " + params.testBoolean)
println("testString: " + params.testString)
println("testInteger: " + params.testInteger)
itemChannel = Channel.from(params.testList)
itemChannel.println()
objectChannel = Channel.from(params.objectList)
process objectProcess {
input:
val person from objectChannel
output:
stdout personDetailsChannel
shell:
'''
echo "This year, !{person.name} is !{person.age} years old"
'''
}
personDetailsChannel.println()
Then run with:
chmod +x testScript.nf
./testScript.nf -params-file params.yaml
Hope this helps.
Is this only for loading pipeline params? Is there a way to use JSON as an input Channel?
Most helpful comment
I've uploaded a new snapshot implementing this feature both for JSON and YAML files. You may want to give a try defining the var
NXF_VER=0.23.4-SNAPSHOT. Feedback is welcome.