Colabtools: Feature Request: Colab Pro to increase storage as well.

Created on 10 Feb 2020  路  23Comments  路  Source: googlecolab/colabtools

  • Describe the current behavior:

    Colab has reduced the storage from 350GB in GPU instance to just 64GB and increased storage for CPU instance to 100GB. This makes some storage-heavy use cases unable to run on Colab.

    Now, Colab Pro is launched with double almost everything: GPU usage, time to run (12->24h), RAM (13->26GB). EXCEPT Storage is kept at 100GB/64GB (same as before).

  • Describe the expected behavior:

    For Colab Pro, increase the storage to 200GB (doubling as well). Or 350GB (the same as before). Or even 700GB (doubling of before).

  • The web browser you are using (Chrome, Firefox, Safari, etc.):

    Firefox.

  • Link to self-contained notebook that reproduces this issue
    (click the Share button, then Get Shareable Link):

    Any notebooks can show the storage limit.

triaged

Most helpful comment

Agree. Please fix it. 27Gb dataset I can download but can not unzip it. Google drive not an option because it works too slow for many files. Around 114 thousand images one epoch work more than 24 hours, for example the same amount of images in kaggle with the same code process one epoch for less than one hour.

All 23 comments

I agree as well. There should be more storage. I basically use colab pro for practicing with Kaggle competitions. Some of the datasets are very big. We have this extra RAM and GPU but can't do much with it if don't have enough storage to process the data.

Strongly agree

Yes. Please fix the storage issue.

Agreed! I'm willing to pay more if necessary to off-set the cost!

please increase the stroage for gpu

Agree. Please fix it. 27Gb dataset I can download but can not unzip it. Google drive not an option because it works too slow for many files. Around 114 thousand images one epoch work more than 24 hours, for example the same amount of images in kaggle with the same code process one epoch for less than one hour.

Just adding my two cents in:

my tf.records data is aprox 16 gb, stored in Google Drive. After training for 2 hours the "quota" of the file is exceeded in Google Drive and I can't access it from Colab. (Can't download it either from G Drive)

Therfore, i'm on the same boat and would like to see a sensible storage increase for Colab Pro at 150 - 200 GB.

Same problem here; went for google colab pro and found out very quickly I can't use it to for deep learning for kaggle competitions. Google drive, although it has tightly knit support, is not a good option at all as it throttles frequent resource access.

Same issue as well and using a drive mount eventually runs into an IO error and significantly slows training. Even if it was an option like with high ram in runtime settings, so if it's not needed the default won't waste resources

Agreed. Even with MSCOCO dataset I faced storage issues when using all 400k images. 64 gb is way too less for any serious work. I would rather pay $12 a month and have more storage than be stuck with a tiny disc.

Yes, I feel more comfortable work with Colab, the only problem is it space is too small for me even for Colab Pro

I have 25 million images around 500 GB. How can I use?

Colab Pro now has twice the storage just like I requested. Thank you.

@korakot is this really solved?

@korakot is this really solved?

Should be, for colab pro one gets 225 GB in total

So if I paid for Colab pro I will get more than 200 GB of GPU runtime storage !?, I don't want to pay for nothing :D.
@korakot

Found that currently, the standard runtime (in active pro) gives you 225 GB, the GPU gives your 147 GB, and the TPU gives you 107 GB.

I've been using the GPU on the pro version frequently and the 147GB is sometimes still not enough. @a-akram-98 the total will be 147GB (like @cibic89 mentioned above) giving you about 110GB left after the system files for the GPU setting.

@jonykarki I saw some tutorials on the web saying if I have data set that exceeds the disk space I can save the data set on PAID google drive and train the model directly from the drive

@jonykarki I saw some tutorials on the web saying if I have data set that exceeds the disk space I can save the data set on PAID google drive and train the model directly from the drive

That's what I thought. Gdrive will severely throttles frequent access (usually done with small files) very quickly crippling deep learning using GPUs. This is even worse with TPUs but then again one should be loading all data in ram to fully utilise benefits of the technology.

@jonykarki I saw some tutorials on the web saying if I have data set that exceeds the disk space I can save the data set on PAID google drive and train the model directly from the drive

That's what I thought. Gdrive will severely throttle access for small files very quickly crippling deep learning using GPUs. This is even worse with TPUs but then again one should be loading all data in ram to fully utilise benefits of the technology.

Exactly! Using GDrive slows things down.

So actually there is no solution till google increase it or add extra paid service.

@jonykarki I saw some tutorials on the web saying if I have data set that exceeds the disk space I can save the data set on PAID google drive and train the model directly from the drive

That's what I thought. Gdrive will severely throttle access for small files very quickly crippling deep learning using GPUs. This is even worse with TPUs but then again one should be loading all data in ram to fully utilise benefits of the technology.

So actually there is no solution till google increase it or add extra paid service.

It's probably engineered that way. Fast.ai with GPU instances such as nvidia 1080 Ti/2080 Ti (12 GB) or Titan RTX (24 GB) are the next cost effective infrastructure approaches. If you are doing transfer learning then you might get away with 1070/1080/2070/2080 having 8 GB of VRAM but this trend is fading fast for cutting edge models in competitive situations such as Kaggle competitions. See a GPU deep learning training performance comparison here: https://i0.wp.com/timdettmers.com/wp-content/uploads/2018/08/performance_TPU_RTX_GPUs.png, more details with batch sizes here: https://lambdalabs.com/blog/choosing-a-gpu-for-deep-learning/

Was this page helpful?
0 / 5 - 0 ratings