Nextflow: Allow a process to access input file metadata

Created on 8 Jun 2018  路  8Comments  路  Source: nextflow-io/nextflow

It should be possible to access input file metadata such as size, lastModified, etc.

This would allow ease the definition of dynamic resource allocations. For example:

饾殥饾殫_饾殞饾殤 = 饾櫜饾殤饾殜饾殫饾殫饾殠饾殨.饾殢饾殯饾殬饾殩饾櫩饾殜饾殱饾殤('饾殤饾殠饾殨饾殨饾殬.饾殱饾殹饾殱')  
饾殭饾殯饾殬饾殞饾殠饾殰饾殰 饾殢饾殬饾殬 {   
  饾殩饾殠饾殩饾殬饾殯饾殺 { 饾殹.饾殰饾殥饾殻饾殠() < 饾煼_000_000 ? 饾煾.饾櫠饾櫛 : 饾熅.饾櫠饾櫛 }
  饾殥饾殫饾殭饾殲饾殱: 饾殢饾殥饾殨饾殠 饾殹 饾殢饾殯饾殬饾殩 饾殥饾殫_饾殞饾殤
  饾殰饾殞饾殯饾殥饾殭饾殱:
  """
  饾殞饾殬饾殩饾殩饾殜饾殫饾殟 --饾殥饾殫饾殭饾殲饾殱 $饾殹
  """
}

Currently it's not possible to access process input file metadata. A possibile workaround requires to fetch that information from outside the process, see here for an example.

kinenhancement

Most helpful comment

This will be included in the next version.

Bonus feature, it's now possibile to compare numeric values with memory and duration units. This allows to write expression such as file.size() > 1.GB.

All 8 comments

Are you sure? I obviously did not make it clear, but I tested that and it appears to work.

Given a slightly modified nf script:

#!/usr/bin/env nextflow 

input_ch = Channel.fromPath('hello.txt')

process foo {
  tag {x.size() < 1_000_000 ? 2.GB : 8.GB}
  input:
    file(x) from input_ch
  script:
  println(x.size())
  """
  cat $x
  """
}

The file size is 416 bytes

ll hello.txt 
-rw-rw-r-- 1 rad rad 416 Jun  8 12:22 hello.txt
nextflow dynamic-res.nf 
N E X T F L O W  ~  version 0.29.0
Launching `dynamic-res.nf` [thirsty_torricelli] - revision: a00ae2f982
[warm up] executor > local
416
[a1/50c6ee] Submitted process > foo (2 GB)

That's a lucky situation. I guess if the file is not in the launching directory is not working.

Just looked up in the documentation and found this ticket - I have a couple of use cases where a process is quite heavily dependent on input size and this would make it possible to specify the size as another parameter on resource consumption :-)

I just came up against this problem too... (a conditional process according to file size)

Have a look at a possible workaround.

Nice, thanks! Yes, I have it working quite nicely now: https://github.com/nf-core/rnaseq/pull/37 - suggestions for improvement welcome!

This will be included in the next version.

Bonus feature, it's now possibile to compare numeric values with memory and duration units. This allows to write expression such as file.size() > 1.GB.

Awesome work @pditommaso! This is really neat :-)

Was this page helpful?
0 / 5 - 0 ratings