Models: TPU Training Error: File system scheme '[local]' not implemented

Created on 11 Mar 2020  路  7Comments  路  Source: tensorflow/models

Hi. I am trying to execute training using Google TPU.

It says: Error recorded from training_loop: File system scheme '[local]' not implemented (file: '/tmp/tmpUUrctt/model.ckpt-0_temp_3546763fe0ab4b32a5353eb8f190192c')

I googled and found that it is a common error and their recommended solution is: All input files and the model directory must use a cloud storage bucket path (gs://bucket-name/...), and this bucket must be accessible from the TPU server.

I have doubled check and confirmed the following:

  • Cloud storage API is already enabled.
  • GCP Project has been made the "Storage Legacy Bucket Owner"

Can I be missing anything? Any help is much appreciated. Thank you.

support

Most helpful comment

It looks like model_dir is set to a local directory in /tmp/, can you double check to make sure that it is a path to a GCS bucket?

All 7 comments

Hi, which models are you using?

Here are some tutorials: https://cloud.google.com/tpu/docs/tutorials/resnet-2.x

Adding a few cloud tpu team members to check out here.
@allenwang28 @gagika

It looks like model_dir is set to a local directory in /tmp/, can you double check to make sure that it is a path to a GCS bucket?

@saberkun I am using ssd_mobilenet_v1_0.75 model.

@allenwang28 The training procedure has created a new directory in my bucket "model_dir" before error-ing. I assume that means the path is already correct.

It could be possible that it's creating a directory within data_dir which is a GCS path. Both data_dir and model_dir should be set to GCS buckets

Hi.

I was able to execute on Cloud v3 TPUs using local files. An example here: https://github.com/sayakpaul/Generating-categories-from-arXiv-paper-titles/blob/master/TPU_Experimentation.ipynb.

Does this GCS requirement apply to tensorboard's log dir too? i.e. tf.summary.create_file_writer(logdir...)

Yes, for Cloud TPU usage, model_dir (which is the parent directory for most models' logdir) must be a GCS bucket. As a sanity check, I've run an experiment where I replaced logdir (where data_dir, model_dir were GCS buckets) with a local directory which failed with this same error.

Was this page helpful?
0 / 5 - 0 ratings