Nextflow: Allow the setting of the bootDiskSize with Google pipelines

Created on 8 Oct 2019  路  19Comments  路  Source: nextflow-io/nextflow

Bug report

In process declaration, the disk directive does not work.
When I run this workflow:

process {
    disk '30 GB'
    container 'python:3'

    script:
    """
    sleep 5 
    echo Done!
    """
}

I get an instance with 10 GB (local) and 500GB pipeline-worker persistent disks.

The Docker image I have is 7 GB large and the workflow runs out of memory since it cannot complete the Docker pull.

Is this issue going to be handled any time soon? I increased my persistent disk quota to accommodate 500GB for each instance. But there is no way around using a large docker image (since 10GB local disk)

Please help, or update. If this is not going to change then I need to change the entire design of my pipeline.

platforgoogle-pipelines

Most helpful comment

The new google-lifesciences executor allow the specification of the book disk size using the following config setting:

process.executor = 'google-lifesciences` 
google.lifeSciences.bootDiskSize = 50.GB

You can try it with the latest snapshot:

NXF_VER=19.12.0-SNAPSHOT nextflow run .. <usual cli params>

All 19 comments

I'm quite sure it's not that difficult to allow customizability of these features (it already exists but does not work) Is this issue going to be handled any time soon?

Good, looking for receiving a code contribution from you.

I'm sorry if that sounded offensive. I just meant if the option is there, that means the developers already thought of/are developing the feature.

I would love to contribute however I'm sure this is between GCP and Nextflow. In GCP we can customize disk size, and since Nextflow uses GCP, that functionality is there and just needs to be implemented.

Once again, I apologize for wording it the way I did, frustrated since Nextflow is supposed to be the answer to all my problems.

No problem, I appreciate the enthusiasm. It should be possible but we have other priorities. The code is available so that anybody can use, learn and propose improvements.

Pipelines API definitely has this functionality.

{
  "minimumCpuCores": number,
  "preemptible": boolean,
  "minimumRamGb": number,
  "disks": [
    {
      object (Disk)
    }
  ],
  "zones": [
    string
  ],
  "bootDiskSizeGb": number,
  "noAddress": boolean,
  "acceleratorType": string,
  "acceleratorCount": string
}

The size of the boot disk. Defaults to 10 (GB).

In nextflow/modules/nf-google/src/main/nextflow/cloud/google/pipelines/GooglePipelinesTaskHandler.groovy if we edit the function createPipelineRequest(), could we not edit the boot disk size from there? Like:

def req = new GooglePipelinesSubmitRequest()
        req.machineType = machineType
        req.project = pipelineConfiguration.project
        req.zone = pipelineConfiguration.zone
        req.region = pipelineConfiguration.region
        req.diskName = diskName
        req.diskSizeGb = task.config.disk?.giga
//Here
        req.bootDiskSizeGb = 30
//Here
        req.preemptible = pipelineConfiguration.preemptible
        req.taskName = "nf-$task.hash"
        req.containerImage = task.container
        req.fileCopyImage = fileCopyImage
        req.stagingScript = stagingScript
        req.mainScript = mainScript
        req.unstagingScript = unstaging.join("; ").trim()
        req.sharedMount = sharedMount
        req.accelerator = task.config.getAccelerator()
        return req
    }

It should work.

If I was to make changes "locally", how could I then install that version of Nextflow? Basically, how do you guys package the nextflow? I can try getting it to work, if it does, I can fork, and upload to a feature branch.

make compile 
./launch.sh 

https://github.com/nextflow-io/nextflow#build-from-source

daudnadeem:nextflow daudn$ ./launch.sh 
Picked up _JAVA_OPTIONS: -Xverify:none
Error: Could not find or load main class nextflow.cli.Launcher

I just understood that ./launch.sh is to be used as nextflow run ...

Anyway. I don't think I can get it to work. My groovy skills are insufficient.

Until this feature is available, I will look for another solution. If the feature is added, I'll come back to implementing the workflow though Nextflow.

To be clear, the only change needed here is for the boot disk. The scratch disk is already configurable via the process.disk directive in the latest edge release.

I can take a look at implementing this, however I can't provide a reliable timeframe right now.

@pditommaso Any thoughts on which directive to use for this? There appears to be a cloud.bootStorageSize in use for AWS. Would you like to use that here as well?

@daudn In the meantime, can you comment on what takes up most of the space in your image? 7GB seems like alot.

@mozack, @pditommaso

Actually, I've implemented a workflow in Kubernetes, then using RabbitMQ with manual autoscaling of VM Instances. Eventually had a look at Nextflow which seems like the answer if only I can get around the limitation of the 10 GB disk (which is why im quite persistent on this feature)

The docker image is so large because the image has a third party tool installed to do HLA Typing. This tool takes up 6GB of space after being installed (I've tried cleaning up the image), the size of third party tool is not in my hands, I'm afraid.

When I was doing this process manually, I had prebuilt (VMI) images on Google Cloud. And whenever a job was to be processed, the instance that booted up already had the previous 'layers' of the docker image and so it was a much quicker process to get the :latest version.

The manual workflow I built is def efficient, but it has many moving parts and so a higher chance for it to fail, and would also be difficult to maintain in the long run as compared to Nextflow.

Would really appreciate it if you guys could allow customisation of local disk storage.

@mozack The idea is to deprecate the cloud context at some point, therefore, I think it could be added a google.pipelines.bootDisk option to configure it.

@mozack any update?

@daudn we are discussing with the google team regarding this, tho no ETA at this time. stay tuned.

Any updates? Had a look at the release but it doesn't seem to include this.

It will be included in the next stable release 20.01.0

@pditommaso thank you for the update, looking forward to the release!

The new google-lifesciences executor allow the specification of the book disk size using the following config setting:

process.executor = 'google-lifesciences` 
google.lifeSciences.bootDiskSize = 50.GB

You can try it with the latest snapshot:

NXF_VER=19.12.0-SNAPSHOT nextflow run .. <usual cli params>
Was this page helpful?
0 / 5 - 0 ratings

Related issues

jaquol picture jaquol  路  7Comments

wflynny picture wflynny  路  6Comments

apeltzer picture apeltzer  路  7Comments

Z-Zen picture Z-Zen  路  5Comments

wikiselev picture wikiselev  路  8Comments