Nextflow: Allow the setting of the bootDiskSize with Google pipelines

Created on 8 Oct 2019 · 19Comments · Source: nextflow-io/nextflow

Bug report

In process declaration, the disk directive does not work.
When I run this workflow:

process {
    disk '30 GB'
    container 'python:3'

    script:
    """
    sleep 5 
    echo Done!
    """
}

I get an instance with 10 GB (local) and 500GB pipeline-worker persistent disks.

The Docker image I have is 7 GB large and the workflow runs out of memory since it cannot complete the Docker pull.

Is this issue going to be handled any time soon? I increased my persistent disk quota to accommodate 500GB for each instance. But there is no way around using a large docker image (since 10GB local disk)

Please help, or update. If this is not going to change then I need to change the entire design of my pipeline.

platforgoogle-pipelines

Source

daudn

Most helpful comment

The new google-lifesciences executor allow the specification of the book disk size using the following config setting:

process.executor = 'google-lifesciences` 
google.lifeSciences.bootDiskSize = 50.GB

You can try it with the latest snapshot:

NXF_VER=19.12.0-SNAPSHOT nextflow run .. <usual cli params>

pditommaso on 10 Dec 2019

👍2

All 19 comments

I'm quite sure it's not that difficult to allow customizability of these features (it already exists but does not work) Is this issue going to be handled any time soon?

Good, looking for receiving a code contribution from you.

pditommaso on 9 Oct 2019

I'm sorry if that sounded offensive. I just meant if the option is there, that means the developers already thought of/are developing the feature.

I would love to contribute however I'm sure this is between GCP and Nextflow. In GCP we can customize disk size, and since Nextflow uses GCP, that functionality is there and just needs to be implemented.

Once again, I apologize for wording it the way I did, frustrated since Nextflow is supposed to be the answer to all my problems.

daudn on 9 Oct 2019

No problem, I appreciate the enthusiasm. It should be possible but we have other priorities. The code is available so that anybody can use, learn and propose improvements.

pditommaso on 9 Oct 2019

Pipelines API definitely has this functionality.

{
  "minimumCpuCores": number,
  "preemptible": boolean,
  "minimumRamGb": number,
  "disks": [
    {
      object (Disk)
    }
  ],
  "zones": [
    string
  ],
  "bootDiskSizeGb": number,
  "noAddress": boolean,
  "acceleratorType": string,
  "acceleratorCount": string
}

The size of the boot disk. Defaults to 10 (GB).

daudn on 10 Oct 2019

In nextflow/modules/nf-google/src/main/nextflow/cloud/google/pipelines/GooglePipelinesTaskHandler.groovy if we edit the function createPipelineRequest(), could we not edit the boot disk size from there? Like:

def req = new GooglePipelinesSubmitRequest()
        req.machineType = machineType
        req.project = pipelineConfiguration.project
        req.zone = pipelineConfiguration.zone
        req.region = pipelineConfiguration.region
        req.diskName = diskName
        req.diskSizeGb = task.config.disk?.giga
//Here
        req.bootDiskSizeGb = 30
//Here
        req.preemptible = pipelineConfiguration.preemptible
        req.taskName = "nf-$task.hash"
        req.containerImage = task.container
        req.fileCopyImage = fileCopyImage
        req.stagingScript = stagingScript
        req.mainScript = mainScript
        req.unstagingScript = unstaging.join("; ").trim()
        req.sharedMount = sharedMount
        req.accelerator = task.config.getAccelerator()
        return req
    }

daudn on 11 Oct 2019

It should work.

pditommaso on 11 Oct 2019

If I was to make changes "locally", how could I then install that version of Nextflow? Basically, how do you guys package the nextflow? I can try getting it to work, if it does, I can fork, and upload to a feature branch.

daudn on 11 Oct 2019

make compile 
./launch.sh

https://github.com/nextflow-io/nextflow#build-from-source

pditommaso on 11 Oct 2019

daudnadeem:nextflow daudn$ ./launch.sh 
Picked up _JAVA_OPTIONS: -Xverify:none
Error: Could not find or load main class nextflow.cli.Launcher

daudn on 14 Oct 2019

I just understood that ./launch.sh is to be used as nextflow run ...

Anyway. I don't think I can get it to work. My groovy skills are insufficient.

Until this feature is available, I will look for another solution. If the feature is added, I'll come back to implementing the workflow though Nextflow.

daudn on 14 Oct 2019

To be clear, the only change needed here is for the boot disk. The scratch disk is already configurable via the process.disk directive in the latest edge release.

I can take a look at implementing this, however I can't provide a reliable timeframe right now.

@pditommaso Any thoughts on which directive to use for this? There appears to be a cloud.bootStorageSize in use for AWS. Would you like to use that here as well?

@daudn In the meantime, can you comment on what takes up most of the space in your image? 7GB seems like alot.

mozack on 14 Oct 2019

@mozack, @pditommaso

Actually, I've implemented a workflow in Kubernetes, then using RabbitMQ with manual autoscaling of VM Instances. Eventually had a look at Nextflow which seems like the answer if only I can get around the limitation of the 10 GB disk (which is why im quite persistent on this feature)

The docker image is so large because the image has a third party tool installed to do HLA Typing. This tool takes up 6GB of space after being installed (I've tried cleaning up the image), the size of third party tool is not in my hands, I'm afraid.

When I was doing this process manually, I had prebuilt (VMI) images on Google Cloud. And whenever a job was to be processed, the instance that booted up already had the previous 'layers' of the docker image and so it was a much quicker process to get the :latest version.

The manual workflow I built is def efficient, but it has many moving parts and so a higher chance for it to fail, and would also be difficult to maintain in the long run as compared to Nextflow.

Would really appreciate it if you guys could allow customisation of local disk storage.

daudn on 15 Oct 2019

@mozack The idea is to deprecate the cloud context at some point, therefore, I think it could be added a google.pipelines.bootDisk option to configure it.

pditommaso on 15 Oct 2019

@mozack any update?

daudn on 22 Oct 2019

@daudn we are discussing with the google team regarding this, tho no ETA at this time. stay tuned.

pditommaso on 22 Oct 2019

👍1

Any updates? Had a look at the release but it doesn't seem to include this.

daudn on 3 Dec 2019

It will be included in the next stable release 20.01.0

pditommaso on 3 Dec 2019

@pditommaso thank you for the update, looking forward to the release!

daudn on 3 Dec 2019

The new google-lifesciences executor allow the specification of the book disk size using the following config setting:

process.executor = 'google-lifesciences` 
google.lifeSciences.bootDiskSize = 50.GB

You can try it with the latest snapshot:

NXF_VER=19.12.0-SNAPSHOT nextflow run .. <usual cli params>

pditommaso on 10 Dec 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Add support for Java 12

jaquol · 7Comments

splitCsv does not handle quoted values containing commas correctly

wflynny · 6Comments

Singleton list conversion cause quirk behaviour

apeltzer · 7Comments

Error when trying to use grep in Nextflow

Z-Zen · 5Comments

Kubernetes - Error syncing pod

wikiselev · 8Comments