Colabtools: Too little hard disk memory on gpu machine

Created on 20 Dec 2019  ยท  16Comments  ยท  Source: googlecolab/colabtools

In the last two days, all GPU notebooks began to provide 30 GB of disk space instead of 340 GB. 30 GB is not enough even to download MS COCO Dataset.

Most helpful comment

Agree. Not enough space with GPU.

All 16 comments

yeah, 360GB for GPU (not the T4 GPU) will be really Helpfull :p

Agree. Not enough space with GPU.

Agree. Not enough space with gpu that really not good if someone wants to make a real experiments

This is what I have observed with various accelerators. I have also asked an SO question:

Hardware accelerator-None

!df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 108G 28G 75G 28% /

Hardware accelerator-GPU

!df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 69G 32G 34G 49% /

Hardware accelerator-TPU

!df -h .
Filesystem Size Used Avail Use% Mounted on
overlay 108G 28G 75G 28% /

Yeah, 64GB is not enough space to make real experiments. Since last 4 days, I have been trying to optimize my code to fit in this little space.

I have the same issue. ๐Ÿ‘ What's ironic about this is that one proposed solution is to mount Google Drive. Therefore I bought 200 GB of Google Drive. However, the disk size is still an issue. Apparently once Google Drive is mounted it starts to cache files in /root/.config/Google/DriveFS/[uniqueid]/content_cache. The cache has no control over its size, it does not delete or replace anything, it just accumulates, and it takes all the disk making the code crash. :(

Any update on this issue?

I have the same issue. ๐Ÿ‘ What's ironic about this is that one proposed solution is to mount Google Drive. Therefore I bought 200 GB of Google Drive. However, the disk size is still an issue. Apparently once Google Drive is mounted it starts to cache files in /root/.config/Google/DriveFS/[uniqueid]/content_cache. The cache has no control over its size, it does not delete or replace anything, it just accumulates, and it takes all the disk making the code crash. :(

So if i have unlimited Google Drive storage i can store data beyond 107gb?

I have the same issue. ๐Ÿ‘ What's ironic about this is that one proposed solution is to mount Google Drive. Therefore I bought 200 GB of Google Drive. However, the disk size is still an issue. Apparently once Google Drive is mounted it starts to cache files in /root/.config/Google/DriveFS/[uniqueid]/content_cache. The cache has no control over its size, it does not delete or replace anything, it just accumulates, and it takes all the disk making the code crash. :(

So if i have unlimited Google Drive storage i can store data beyond 107gb?

No, it won't work because as I said the driver to mount Google Drive has a cache that is not a cache, it is a copy. Therefore any file you read will be copied on the disk and that will fill the disk.

I have the same issue. ๐Ÿ‘ What's ironic about this is that one proposed solution is to mount Google Drive. Therefore I bought 200 GB of Google Drive. However, the disk size is still an issue. Apparently once Google Drive is mounted it starts to cache files in /root/.config/Google/DriveFS/[uniqueid]/content_cache. The cache has no control over its size, it does not delete or replace anything, it just accumulates, and it takes all the disk making the code crash. :(

So if i have unlimited Google Drive storage i can store data beyond 107gb?

No, it won't work because as I said the driver to mount Google Drive has a cache that is not a cache, it is a copy. Therefore any file you read will be copied on the disk and that will fill the disk.

I've made an issue of this: https://github.com/googlecolab/colabtools/issues/960

Great. Let us know if they respond or make any changes.

Hey guys, I figured out how to get the whole COCO-2017 dataset into Colab with Google Drive. Basically I broke train2017 and test2017 down into sub directories with a max of 5000 files (I noticed Colab could only read somewhere around 15k files from a directory, so 5000 seemed a safe bet). Here is the code for that: https://github.com/sawyermade/detectron2_pkgs/tree/master/dataset_download

Then I used rclone to upload the whole damn dataset to Google Drive and shred with anyone who has a link can view: https://drive.google.com/drive/folders/1EVsLBRwT2njNWOrmBAhDHvvB8qrd9pXT?usp=sharing

Once you have the share in your google drive, create a shortcut for it so it can be accessed by Colab. Then I just create 118287 for train and 40670 for test symbolic links in the local directory. So far, it is working like a charm. I even save all my output to Google Drive so it can be resumed after the 12 hour kick. Here is the notebook for that: https://colab.research.google.com/drive/1OVStblo4Q3rz49Pe9-CJcUGkkCDLcMqP

I am training a mask rcnn now, will report results when finished but its looking pretty damn good so far.

Please fix storage issue. Colab became useless for middle size datasets. Sometimes colab with attached google drive might show input/output error when I have less than 5000 files and what is the main drawback it reads files too slow, like 20 times slower that for example I did the same on kaggle.

Actually, I ended up still having severe problems with 5000 files too. Ended up doing sub directories at 999, worked perfectly. Trained 3 models so far with detectron2 in around 15 hours each. That's almost as fast as my desktop.

Hey guys, I figured out how to get the whole COCO-2017 dataset into Colab with Google Drive. Basically I broke train2017 and test2017 down into sub directories with a max of 5000 files (I noticed Colab could only read somewhere around 15k files from a directory, so 5000 seemed a safe bet). Here is the code for that: https://github.com/sawyermade/detectron2_pkgs/tree/master/dataset_download

Then I used rclone to upload the whole damn dataset to Google Drive and shred with anyone who has a link can view: https://drive.google.com/drive/folders/1EVsLBRwT2njNWOrmBAhDHvvB8qrd9pXT?usp=sharing

Once you have the share in your google drive, create a shortcut for it so it can be accessed by Colab. Then I just create 118287 for train and 40670 for test symbolic links in the local directory. So far, it is working like a charm. I even save all my output to Google Drive so it can be resumed after the 12 hour kick. Here is the notebook for that: https://colab.research.google.com/drive/1OVStblo4Q3rz49Pe9-CJcUGkkCDLcMqP

I am training a mask rcnn now, will report results when finished but its looking pretty damn good so far.

how will you delete those files from your google drive? I'm still stuck in this problem and now cant delete those files one by one from trash.

Why would you delete them?

On Wed, May 6, 2020, 1:46 PM vivek gangwar notifications@github.com wrote:

Hey guys, I figured out how to get the whole COCO-2017 dataset into Colab
with Google Drive. Basically I broke train2017 and test2017 down into sub
directories with a max of 5000 files (I noticed Colab could only read
somewhere around 15k files from a directory, so 5000 seemed a safe bet).
Here is the code for that:
https://github.com/sawyermade/detectron2_pkgs/tree/master/dataset_download

Then I used rclone to upload the whole damn dataset to Google Drive and
shred with anyone who has a link can view:
https://drive.google.com/drive/folders/1EVsLBRwT2njNWOrmBAhDHvvB8qrd9pXT?usp=sharing

Once you have the share in your google drive, create a shortcut for it so
it can be accessed by Colab. Then I just create 118287 for train and 40670
for test symbolic links in the local directory. So far, it is working like
a charm. I even save all my output to Google Drive so it can be resumed
after the 12 hour kick. Here is the notebook for that:
https://colab.research.google.com/drive/1OVStblo4Q3rz49Pe9-CJcUGkkCDLcMqP

I am training a mask rcnn now, will report results when finished but its
looking pretty damn good so far.

how will you delete those files from your google drive? I'm still stuck in
this problem and now cant delete those files one by one from trash.

โ€”
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/googlecolab/colabtools/issues/919#issuecomment-624793416,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADUE52322BEXHMDD45HP5CLRQGPADANCNFSM4J52TWVQ
.

Was this page helpful?
0 / 5 - 0 ratings