Bert: run_classifier.py gets struck while saving checkpoint

Created on 22 Dec 2018  路  3Comments  路  Source: google-research/bert

INFO:tensorflow: name = bert/encoder/layer_23/output/LayerNorm/gamma:0, shape = (1024,), INIT_FROM_CKPT
INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (1024, 1024), INIT_FROM_CKPT
INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (1024,), INIT_FROM_CKPT
INFO:tensorflow: name = output_weights:0, shape = (2, 1024)
INFO:tensorflow: name = output_bias:0, shape = (2,)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from output/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into output/model.ckpt.

It gets struck after this am I doing something wrong.

python run_classifier.py \
--task_name=MRPC \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=/tmp/mrpc_output/

Most helpful comment

that's pretty normal from my experience. I left it to run overnight and check it on the second day, it solves

All 3 comments

that's pretty normal from my experience. I left it to run overnight and check it on the second day, it solves

refer to #212

I'm pretty confident the lag is not caused by saving the checkpoint itself although the logging says saving checkpoint.... I checked my bucket and I found that the checkpoint gets saved within a minute. I also tried running with only 20 steps and the run completes just fine. Maybe logging should be the fix here; such that instead of logging saving..., it should log after the save is done and it says saved.... That way we're not mislead to debug the saving. Also extra logging to signal that the run is still alive would be very helpful!

Was this page helpful?
0 / 5 - 0 ratings