Colabtools: files in colab is not in sync with google drive

Created on 29 Sep 2018  路  19Comments  路  Source: googlecolab/colabtools

It appears some big file I can see from colab is not in sync with mounted google drive, not until I restart colab runtime.

  1. mount google drive to a colab notebook
  2. create some big file to the mounted google drive, in my case, I create a tar file(5GB) from the images I am having on host and tar file is generated on mounted google drive.
  3. check with "ls" to ensure I can see that file from colab
  4. open my google drive in browser, I CAN NOT see that file, the google drive storage usage also tells storage consumption not changed. Not visible even after 30 minutes
  5. restart colab runtime
  6. check with google drive in browser again and now I can see that 5GB file.

Most helpful comment

Data written to Google Drive is first written to the disk attached to the VM, so the maximum size is limited by that.
The google.colab.drive module has a recently-added flush_and_unmount() function that you can use to sync data written to the local VM's disk cache of your Drive-mounted folder back to Google Drive, after which a "Reset all runtimes" (from the Runtime menu) will get you a fresh VM.
We're working on making all this more transparent.

All 19 comments

I have a similar error and even restarting the runtime doesn't help.

EDIT: the files eventually appear in drive, so so some sync is happening. Maybe add sync check function?

I'm facing similar issue

Hi, facing a similar issue here. I can see files I've created against the mounted drive in the Colab file browser. But when I view them in the Google Drive web app they don't appear there.

Indeed, synchronization between the colab VM and the Drive backend happens asynchronously, so large writes on one system can take a while to show up in the other.

I had same problem, it's been one hour, Colab hasn't updated as the files in google drive

Same problem. But they do not get synced at all. I am training a BERT model and I want my checkpoints to be stored in drive. All other files and commands like !mkdir or !mv work without problems.. Seems to have started after colab updated from tensorflow 1.13.1 to 1.14.0rc before it worked fine... downgrading to tensorflow-gpu==1.13.1 did not solve the problem though

Facing similar issue, files are not getting synced from colab to google drive. I can see the files in colab file browser but they are not visible in web gui of google drive .

Although, One workaround worked for me, do it with caution, on some test job first.
I am running a training for 85,000 steps and checkpoint is created after every 5000 steps. So I manually stop the job and immediately i can see the files are available in google drive. Then I restart the job, and it automatically picks up from where it left .

Same problem, but even folders created with !mkdir don't synchronize. I can also see them through Colab, but not in the Google Drive GUI.

Edit: I solved it by restarting the notebook and forcing the remount of Google Drive:

drive.mount('/content/gdrive', force_remount=True)

I also noticed that modifying Drive folders on the GUI while having it mount seems to cause the bug.

Data written to Google Drive is first written to the disk attached to the VM, so the maximum size is limited by that.
The google.colab.drive module has a recently-added flush_and_unmount() function that you can use to sync data written to the local VM's disk cache of your Drive-mounted folder back to Google Drive, after which a "Reset all runtimes" (from the Runtime menu) will get you a fresh VM.
We're working on making all this more transparent.

Hi,

I faced a similar issue, solved it by renaming the folder from checkpoints to ckpt. I tried renaming it back and forth, whenever it's renamed checkpoints, any file inside the folder does not appear.

@steviejsutanto Yes, this is a known issue; see #621.

Hi flush_unmount function saves the files to disk but also unmounts the drive. Is there any other way to dynamically synchronise tensorboard event files and checkpoints from colab to google drive once in a while without unmounting ?

flush_unmount() also unmounts the drive. Is it a good approach to try, drive.mount('/content/gdrive', force_remount=True) with in a loop? This worked for me, But don't know whether this is a good approach.
What I did:

n=1
for i in range(0,n):
  try:
    drive.mount('/content/drive', force_remount=True)
  except:
    pass

Why not add drive.flush() without the unmount? It would be nice to be able to watch tensorboard while things are running.

image

flush_and_unmount() running for a while 15min now. Is there any alternative that syncs files between Colab and Google Drive.

I have found the reason why one cant mount ones own google drive for these things is because of a race condition with google . First it was suggested that changing the mount location from /content/gdrive to /content/something else but this didnt fix it. What I ended up doing was copying manually the files that are copied to google drive, then installing the google drive desktop application I would then in windows 10 go to the folder which is now located on google drive and disable file permissions inheritance and then manually putting full control rights on the folder to the users group and to authenticated users group. This seems to have fixed this for me. Other times I have noticed with these colabs (not this one in particular but some of the components used like the trained models are missing from the repository (as if they had been removed) Only solution for this is to look around for other sources of these files. This includes scurrying through google search engine and also looking at the git checkout level to find branches besides master and also looking for projects that cloned the project on github to see if they still include the files. Hope this helps!

Any update on this? I am trying to run Oscar on Colab, but I need to do changes on the .py files that are stored on Drive. I execute these files using !python file.py in Colab. But it is not feasible to wait for syncronization everytime I write a print("Hello World") on the python files. A forced syncronization would come in handy, otherwise Colab is just a plain Jupyter notebook in the cloud and we can't exploit it appropriately.

Also, I am using Colab Pro.

Why not add drive.flush() without the unmount? It would be nice to be able to watch tensorboard while things are running.

Agree, sometime we just want to flush without unmount

Data written to Google Drive is first written to the disk attached to the VM, so the maximum size is limited by that.
The google.colab.drive module has a recently-added flush_and_unmount() function that you can use to sync data written to the local VM's disk cache of your Drive-mounted folder back to Google Drive, after which a "Reset all runtimes" (from the Runtime menu) will get you a fresh VM.
We're working on making all this more transparent.

from google.colab import drive
drive.flush_and_unmount()

this worked for me tnx

Was this page helpful?
0 / 5 - 0 ratings