I am running the following command line in google colab and using TPU. I am using the pre-trained model for this as well as the data-set.
command line:
!python /content/bert/run_squad.py \
--vocab_file=$BERT_LARGE_DIR/vocab.txt \
--bert_config_file=$BERT_LARGE_DIR/bert_config.json \
--init_checkpoint=$BERT_LARGE_DIR/bert_model.ckpt \
--do_train=True \
--train_file=$SQUAD_DIR/train-v2.0.json \
--do_predict=True \
--predict_file=$SQUAD_DIR/dev-v2.0.json \
--train_batch_size=24 \
--learning_rate=3e-5 \
--num_train_epochs=2.0 \
--max_seq_length=384 \
--doc_stride=128 \
--output_dir=/content/squad_large/ \
--use_tpu=True \
--tpu_name=$TPU_NAME \
--version_2_with_negative=True
However, I am running into following error every time. I am not sure where I am doing wrong.
`Deprecated in favor of operator or tf.math.divide.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:TPU job name worker
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Error recorded from training_loop: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /content/uncased_L-24_H-1024_A-16/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: '/content/uncased_L-24_H-1024_A-16/bert_model.ckpt')
[[node checkpoint_initializer_307 (defined at /content/bert/run_squad.py:627) ]]
Caused by op 'checkpoint_initializer_307', defined at:
File "/content/bert/run_squad.py", line 1283, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/content/bert/run_squad.py", line 1215, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2251, in _call_model_fn
config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2558, in _model_fn
_train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2896, in _train_on_tpu_system
scaffold = _get_scaffold(captured_scaffold_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 3021, in _get_scaffold
scaffold = scaffold_fn()
File "/content/bert/run_squad.py", line 627, in tpu_scaffold
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 190, in init_from_checkpoint
_init_from_checkpoint, args=(ckpt_dir_or_file, assignment_map))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1516, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1524, in _merge_call
return merge_fn(self._distribution_strategy, *args, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 234, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 358, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 312, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(args, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on /content/uncased_L-24_H-1024_A-16/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: '/content/uncased_L-24_H-1024_A-16/bert_model.ckpt')
[[node checkpoint_initializer_307 (defined at /content/bert/run_squad.py:627) ]]
INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /content/uncased_L-24_H-1024_A-16/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: '/content/uncased_L-24_H-1024_A-16/bert_model.ckpt')
[[{{node checkpoint_initializer_307}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/bert/run_squad.py", line 1283, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/content/bert/run_squad.py", line 1215, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2457, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1158, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1403, in _train_with_estimator_spec
log_step_count_steps=log_step_count_steps) as mon_sess:
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 508, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 934, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1122, in __init__
_WrappedSession.__init__(self, self._create_session())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1127, in _create_session
return self._sess_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 805, in create_session
self.tf_sess = self._session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 571, in create_session
init_fn=self._scaffold.init_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/session_manager.py", line 287, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /content/uncased_L-24_H-1024_A-16/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: '/content/uncased_L-24_H-1024_A-16/bert_model.ckpt')
[[node checkpoint_initializer_307 (defined at /content/bert/run_squad.py:627) ]]
Caused by op 'checkpoint_initializer_307', defined at:
File "/content/bert/run_squad.py", line 1283, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/content/bert/run_squad.py", line 1215, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2251, in _call_model_fn
config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2558, in _model_fn
_train_on_tpu_system(ctx, model_fn_wrapper, dequeue_fn))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2896, in _train_on_tpu_system
scaffold = _get_scaffold(captured_scaffold_fn)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 3021, in _get_scaffold
scaffold = scaffold_fn()
File "/content/bert/run_squad.py", line 627, in tpu_scaffold
tf.train.init_from_checkpoint(init_checkpoint, assignment_map)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 190, in init_from_checkpoint
_init_from_checkpoint, args=(ckpt_dir_or_file, assignment_map))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1516, in merge_call
return self._merge_call(merge_fn, args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1524, in _merge_call
return merge_fn(self._distribution_strategy, *args, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 234, in _init_from_checkpoint
_set_variable_or_list_initializer(var, ckpt_file, tensor_name_in_ckpt)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 358, in _set_variable_or_list_initializer
_set_checkpoint_initializer(variable_or_list, ckpt_file, tensor_name, "")
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/checkpoint_utils.py", line 312, in _set_checkpoint_initializer
ckpt_file, [tensor_name], [slice_spec], [base_type], name=name)[0]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1572, in restore_v2
name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(args, *kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to get matching files on /content/uncased_L-24_H-1024_A-16/bert_model.ckpt: Unimplemented: File system scheme '[local]' not implemented (file: '/content/uncased_L-24_H-1024_A-16/bert_model.ckpt')
[[node checkpoint_initializer_307 (defined at /content/bert/run_squad.py:627) ]]
`
The pre-trained model contains bert_model.ckpt.meta, bert_model.ckpt.data-00000-of-00001, and bert_model.ckpt.index.
run_squad.py has the code tf.train.init_from_checkpoint to restore the model chekpoint, however it doesn't recognize bert_model.ckpt. It doesn't even read the above three ckpt files. Hence, the error,in my opinion.
Please let me know how to resolve this.
Local file system is not supported on TPU. You need to use Google Storage.
i have stored in google storage but that doesn't resolve the problem. It might be helpful reading this. https://stackoverflow.com/questions/41265035/tensorflow-why-there-are-3-files-after-saving-the-model
Your --output_dir needs to be a folder in google cloud storage as well.
Local file system is not supported on TPU. You need to use Google Storage.
Is there any way we can run TPU while using colab local storage for bert?
I am getting below while trying to build/compile model with TPU:
InvalidArgumentError Traceback (most recent call last)
10 name="segment_ids")
11 # BERT layer from pretrained model
---> 12 bert_layer = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/2",trainable=True)
13 # Dense Layers
14 pooled_output, sequence_output = bert_layer([input_word_ids, input_mask, segment_ids])
18 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /tmp/tfhub_modules/3e9209b9f2a53dfa4e6d93250dfceb5e64d73b66/variables/variables: Unimplemented: File system scheme '[local]' not implemented (file: '/tmp/tfhub_modules/3e9209b9f2a53dfa4e6d93250dfceb5e64d73b66/variables/variables')
Interestingly, I am not reading/saving any data yet except reading pre-trained BERT from TF Hub...
Everything works fine with GPU. Does TPU allows reading pretrained models from TF Hub?
Most helpful comment
Local file system is not supported on TPU. You need to use Google Storage.