Nextflow: Allow the output of process env variables

Created on 31 Aug 2015  Â·  19Comments  Â·  Source: nextflow-io/nextflow

It would be great if users could set new variables from inside the script. For example:

process example {
  input:
  val species_name

  output:
  set taxid, accession, file('genome.fasta') into labeled_files

  """
  taxid=`get_taxid $species_name`
  accession=`get_accession $species_name`
  download_genome \$accession > genome.fasta
  """
}

At the moment, we can extract at most one variable by using stdout:

process example {
  input:
  val species_name

  output:
  set stdout, file('genome.fasta') into labeled_files

  """
  taxid=`get_taxid $species_name`
  download_genome \$accession > genome.fasta
  printf "\$foo"
  """
}

It's possible that this is sufficient for most people, but the request is here just in case others would find the bash→groovy variable extraction helpful.

kinfeature

Most helpful comment

After having spent a week at Pawsey in Perth, in honour of Robert that likely has been the first Australian Nextflow user, I've decided finally to implement this feature. The relaxed atmosphere at this latitude has contributed. Copying @SvenDowideit that was also requiring this feature.

All 19 comments

I agree this would be great!

An option could be to add the support for env outputs in order to capture the values of environment variables in the context of the BASH script. For example:

process example {
  input:
  val species_name

  output:
  set env(taxid), file('genome.fasta') into labeled_files

  """
  taxid=`get_taxid $species_name`
  download_genome \$accession > genome.fasta
  """
} 

Indeed, it sounds like an elegant solution to me.

env(taxid) is a great idea

On Tue, 1 Sep 2015 04:55 Matthieu Foll [email protected] wrote:

Indeed, it sounds like an elegant solution to me.

—
Reply to this email directly or view it on GitHub
https://github.com/nextflow-io/nextflow/issues/69#issuecomment-136639772
.

I'm bringing up this issue again because I have a similar need. Was this enhancement implemented in a recent release?

Unfortunately not yet.

An approach I've used in the past is to print JSON or YAML from the process and capture that string in stdout. Then use JsonSlurper (or yaml analog) in the resulting output channel to parse that string into a groovy object. This object can be used directly or otherwise manipulated or decomposed into channel values with the usual suspect operators.

Not sure to understand. Could you provide an example?

Here's a basic example:

import groovy.json.JsonSlurper

Channel.from(3,4,5).into{ data }

process emitJson {

    input:
    val x from data

    output:
    stdout into out

    """
    #!/usr/bin/env python
    import json

    result = []
    for i in range(1,$x):
        result.append({'a': i*$x, 'b': i*i*$x})

    # I print a list of objects as a JSON string
    print json.dumps(result)
    """
}

slurp = new JsonSlurper()

out
    .flatMap{ x -> slurp.parseText(x) }
    .view()

Now try replacing the flatMap at the with just a view to see the difference:

out.view()

This is smart, but ideally it should be possible to capture one or more variable w/o using an external parsing.

A way could be to dump the process environment into a file, then having NF to parse that file to fetch the variable(s) specified in the process output.

What I don't like about this is that it requires yet another file to created for each process. Moreover it would work only for BASH task. Thus I'm not so convinced to implement it.

Maybe a better way to handle this problem would consist on having NF opening a TCP socket that can be use to fetch script ENV variables without having to pass trough a file, eg.

env > /dev/tcp/<nextflow host address>/<port>

But stil would only work for BASH scripts.

It would be useful yes.
Here I'm trying to extract the ID of a sample from its BAM file, all the while indexing it.

process IndexBAM {
   input:
  set val(status), file(bam) from ch_unindexed_bam

  output:
  set val(status), stdout, file(bam), file("*.bai") into ch_indexed_bam

  script:
"""
# Extract ID from BAM header
idSample=`samtools view -H ${bam} | grep "^@RG" | grep -oP 'ID:\\K(\\S+)'`
# Index BAM
filename=${bam}
samtools index ${bam} \${filename%.bam}.bai
# Send idSample to stdout to be captured by nextflow
printf "\$idSample"
"""
}

Here I used the trick suggested by OP (thanks @robsyme ). But there could be other features I'd like to extract!
@msmootsgi Nice idea too!

After having spent a week at Pawsey in Perth, in honour of Robert that likely has been the first Australian Nextflow user, I've decided finally to implement this feature. The relaxed atmosphere at this latitude has contributed. Copying @SvenDowideit that was also requiring this feature.

Ha. Thanks Paolo :)

An option could be to add the support for env outputs in order to capture the values of environment variables in the context of the BASH script. For example:

process example {
  input:
  val species_name

  output:
  set env(taxid), file('genome.fasta') into labeled_files

  """
  taxid=`get_taxid $species_name`
  download_genome \$accession > genome.fasta
  """
} 

Hi

I have tried this solution but it did not work I do not know why but it says:
invalid set output parameter declaration -- item nextflow.script.tokenEnvCall(nextflow.script.TokenVar(e))

I have included the variable in the shell script
"""
export e="sugar"

"""
any ideas please?

Thanks
Hamza

This has not released yet, it only works if you build from source.

This has not released yet, it only works if you build from source.

Oh I see, Many thanks

Hamza

This feature will be really helpful. Is this coming on v20.01.0??

Yes

Was this page helpful?
0 / 5 - 0 ratings