Bert: Bert_model.ckpt not found with run_squad.py on TPU

Created on 11 Nov 2018  Â·  6Comments  Â·  Source: google-research/bert

After running the following for about 5 minutes on a cloud based TPU, I get an error Unsuccessful TensorSliceReader constructor: Failed to get matching files

The command is as follows:
python run_squad.py --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --do_train=True --train_file=$SQUAD_DIR/train-v1.1.json --do_predict=True --predict_file=$SQUAD_DIR/dev-v1.1.json --train_batch_size=24 --learning_rate=3e-5 --num_train_epochs=2.0 --max_seq_length=384 --doc_stride=128 --output_dir=gs://data_for_squad1/Squad1/ --use_tpu=True --tpu_name=$TPU_NAME

The BERT_BASE_DIR (./largebert) has the following files:
bert_config.json bert_model.ckpt.data-00000-of-00001 bert_model.ckpt.index bert_model.ckpt.meta vocab.txt

Here is the detailed Traceback:

self._traceback = tf_stack.extract_stack() _train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2830, in _train_on_tpu_system scaffold = _get_scaffold(captured_scaffold_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2953, in _get_scaffold scaffold = scaffold_fn() File "run_squad.py", line 584, in tpu_scaffold tf.train.init_from_checkpoint(init_checkpoint, assignment_map) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint _init_from_checkpoint, ckpt_dir_or_file, assignment_map) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/distribute.py", line 1053, in merge_call return self._merge_call(merge_fn, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/distribute.py", line 1061, in _merge_call return merge_fn(self._distribution_strategy, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 231, in _init_from_checkpoint _set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 355, in _set_variable_or_list_initializer _set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "") File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 309, in _set_checkpoint_initializer ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0] File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__ self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./largebert/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: './largebert/bert_model.ckpt') [[node checkpoint_initializer_370 (defined at run_squad.py:584) = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoin t_initializer/prefix, checkpoint_initializer_370/tensor_names, checkpoint_initializer/shape_and_slices)]]

Been trying to troubleshoot for a while, not sure where the problem lies. Any help would be appreciated.

Most helpful comment

It basically says that local file system scheme is not supported. Your config, vocab and init_checkpoint should also point to your google cloud bucket.

For e.g.

python bert/run_squad.py \
--vocab_file=gs://{your bucket name}/vocab.txt \
--bert_config_file=gs://{your bucket name}/bert_config.json \
--init_checkpoint=gs://{your bucket name}/bert_model.ckpt \
--do_train=False \
--train_file=train-v1.1.json \
--do_predict=True \
--predict_file=dev-v1.1.json \
--train_batch_size=24 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=gs://{your bucket name}/squad_large/ \
--use_tpu=True \
--tpu_name=grpc://{tpu_name}

All 6 comments

It basically says that local file system scheme is not supported. Your config, vocab and init_checkpoint should also point to your google cloud bucket.

For e.g.

python bert/run_squad.py \
--vocab_file=gs://{your bucket name}/vocab.txt \
--bert_config_file=gs://{your bucket name}/bert_config.json \
--init_checkpoint=gs://{your bucket name}/bert_model.ckpt \
--do_train=False \
--train_file=train-v1.1.json \
--do_predict=True \
--predict_file=dev-v1.1.json \
--train_batch_size=24 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=gs://{your bucket name}/squad_large/ \
--use_tpu=True \
--tpu_name=grpc://{tpu_name}

When I ran below code in VM instance on TPU

python /home/schen/bert/run_squad.py \
--vocab_file=gs://{bucket_name}/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=gs:/{bucket_name}/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=gs://{bucket_name}/uncased_L-12_H-768_A-12/bert_model.ckpt \
--do_train=True \
--do_predict=True \
--train_file=/home/schen/squad/train-v1.1.json \
--predict_file=/home/schen/squad/dev-v1.1.json \
--train_batch_size=32 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=gs://{bucket_name}/squad_base/ \
--use_tpu=True \
--tpu_name=ai

or replace the last flag either with

--tpu_name=grpc://ai

or

--tpu_name=grpc://{tpu_ip}:8470

I got the error as follow:

INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"errors": [
{
"domain": "global",
"reason": "forbidden",
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
],
"code": 403,
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
}
'
when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12
[[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

Caused by op u'checkpoint_initializer_139', defined at:
File "/home/schen/bert/run_squad.py", line 1283, in
tf.app.run()
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/schen/bert/run_squad.py", line 1215, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train
saving_listeners=saving_listeners
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn
features, labels, mode, config)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn
model_fn_results = self._model_fn(features=features, *kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2493, in _model_fn
_train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn))
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2821, in _train_on_tpu_system
scaffold = _get_scaffold(captured_scaffold_fn)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2944, in _get_scaffold
scaffold = scaffold_fn()
File "/home/schen/bert/run_squad.py", line 627, in tpu_scaffold
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint
_init_from_checkpoint, ckpt_dir_or_file, assignment_map)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1040, in merge_call
return self._merge_call(merge_fn, *args, *
kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1048, in _merge_call
return merge_fn(self._distribution_strategy, args, *kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 231, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 355, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 309, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(args, *kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"errors": [
{
"domain": "global",
"reason": "forbidden",
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
],
"code": 403,
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
}
'
when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12
[[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/home/schen/bert/run_squad.py", line 1283, in
tf.app.run()
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/schen/bert/run_squad.py", line 1215, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2400, in train
rendezvous.raise_errors()
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors
six.reraise(typ, value, traceback)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train
saving_listeners=saving_listeners
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1215, in _train_model_default
saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1406, in _train_with_estimator_spec
log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 504, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 921, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 643, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1107, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1112, in _create_session
return self._sess_creator.create_session()
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 800, in create_session
self.tf_sess = self._session_creator.create_session()
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 566, in create_session
init_fn=self._scaffold.init_fn)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 287, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1110, in _run
feed_dict_tensor, options, run_metadata)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run
run_metadata)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"errors": [
{
"domain": "global",
"reason": "forbidden",
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
],
"code": 403,
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
}
'
when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12
[[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

Caused by op u'checkpoint_initializer_139', defined at:
File "/home/schen/bert/run_squad.py", line 1283, in
tf.app.run()
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/schen/bert/run_squad.py", line 1215, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2394, in train
saving_listeners=saving_listeners
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1181, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1211, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2186, in _call_model_fn
features, labels, mode, config)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 1169, in _call_model_fn
model_fn_results = self._model_fn(features=features, *kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2493, in _model_fn
_train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn))
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2821, in _train_on_tpu_system
scaffold = _get_scaffold(captured_scaffold_fn)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2944, in _get_scaffold
scaffold = scaffold_fn()
File "/home/schen/bert/run_squad.py", line 627, in tpu_scaffold
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 187, in init_from_checkpoint
_init_from_checkpoint, ckpt_dir_or_file, assignment_map)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1040, in merge_call
return self._merge_call(merge_fn, *args, *
kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/distribute.py", line 1048, in _merge_call
return merge_fn(self._distribution_strategy, args, *kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 231, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 355, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/training/checkpoint_utils.py", line 309, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(args, *kwargs)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3272, in create_op
op_def=op_def)
File "/home/schen/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1768, in __init__
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://object.propel.ai/bert/uncased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"errors": [
{
"domain": "global",
"reason": "forbidden",
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
],
"code": 403,
"message": "my_account_email does not have storage.objects.list access to object.propel.ai."
}
}
'
when reading gs://object.propel.ai/bert/uncased_L-12_H-768_A-12
[[{{node checkpoint_initializer_139}} = RestoreV2[dtypes=[DT_FLOAT], _device="/job:worker/replica:0/task:0/device:CPU:0"](checkpoint_initializer/prefix, checkpoint_initializer_139/tensor_names, checkpoint_initializer/shape_and_slices)]]

System information

What is the top-level directory of the model you are using: google-research/bert

Here is the link: https://github.com/google-research/bert

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): my laptop is Mac OS High Sierra (version 10.13.6).
The VM instance is Linux 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_6
TensorFlow installed from (source or binary): python -m pip install tensorflow=1.11
TensorFlow version (use command below): 1.11.0 after runing
python -c "import tensorflow as tf; print(tf.__version__)"

If I removed the last two flags and not ran on TPU it worked properly. However, I really want to utilize TPU to speed up the computation.I have stuck on this TPU issue for a long time. When I ran another demo code bert/run_classifier.py I got the same error. It's really frustrating. Any help would be appreciated!

Getting the admin to authorize permissions for both my VM account and TPU account solved the issue.

@webstruck So I cannot reference the bert model with gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12 ?

The error I am having is

tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://bert_models/2018_10_18/un
cased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "forbidden",
    "message": "[email protected] does not have storage.objects.list access to bert_models."
   }
  ],
  "code": 403,
  "message": "[email protected] does not have storage.objects.list access to bert_models."
 }
}
'
         when reading gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12

which is generated by

export BERT_BASE_DIR=gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12
export SQUAD_11_EN_DIR=gs://<my_bucket>/squad1.1
export TPU_NAME=<my_tpu>

python run_squad.py \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --do_train=True \
  --train_file=$SQUAD_11_EN_DIR/train-v1.1.json \
  --do_predict=True \
  --predict_file=$SQUAD_11_EN_DIR/dev-v1.1.json \
  --train_batch_size=8 \
  --learning_rate=3e-5 \
  --num_train_epochs=2.0 \
  --max_seq_length=384 \
  --doc_stride=128 \
  --output_dir=gs://bert_deep_finder/output/ \
  --use_tpu=True \
  --tpu_name=$TPU_NAME

So, should I upload the model into my bucket? I cannot use the one in gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12?

Thanks

Yes, download the model and then upload into your google cloud storage bucket, set the path as environment variable or just use the absolute path.

On Mar 15, 2019, at 3:53 PM, Yari notifications@github.com wrote:

@webstruck https://github.com/webstruck So I cannot reference the bert model with gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12 ?

The error I am having is

tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on gs://bert_models/2018_10_18/un
cased_L-12_H-768_A-12/bert_model.ckpt: Permission denied: Error executing an HTTP request: HTTP response code 403 with body '{
"error": {
"errors": [
{
"domain": "global",
"reason": "forbidden",
"message": "[email protected] does not have storage.objects.list access to bert_models."
}
],
"code": 403,
"message": "[email protected] does not have storage.objects.list access to bert_models."
}
}
'
when reading gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12

which is generated by

export BERT_BASE_DIR=gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12
export SQUAD_11_EN_DIR=gs:///squad1.1
export TPU_NAME=

python run_squad.py \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--do_train=True \
--train_file=$SQUAD_11_EN_DIR/train-v1.1.json \
--do_predict=True \
--predict_file=$SQUAD_11_EN_DIR/dev-v1.1.json \
--train_batch_size=8 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=gs://bert_deep_finder/output/ \
--use_tpu=True \
--tpu_name=$TPU_NAME
So, should I upload the model into my bucket? I cannot use the one in gs://bert_models/2018_10_18/uncased_L-12_H-768_A-12?

Thanks

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/google-research/bert/issues/98#issuecomment-473465748, or mute the thread https://github.com/notifications/unsubscribe-auth/AUGROASzpjAGRvguHw5tmhRvUzaZYzVSks5vXCR9gaJpZM4YYYV8.

Sorry for asking, but then how can I use the models that are already online in the bert_models/ storage bucket? I suppose there must be a way since it's mentioned in the Fine-tuning with Cloud TPUs section of the repo.

Edit:
Could it be that my Cloud TPU is not in the same region as the bert_models/ bucket?

Was this page helpful?
0 / 5 - 0 ratings