Tensor2tensor: NotFoundError: Key not found in checkpoint when use problem translate_enzh_wmt8k

Created on 14 Nov 2017  路  5Comments  路  Source: tensorflow/tensor2tensor

NotFoundError (see above for traceback): Key body/decoder/layer_4/encdec_attention/layer_prepostpr ocess/layer_norm/layer_norm_bias not found in checkpoint
when I use problem translate_enzh_wmt8k with hparams_set=transformer_base_single_gpu.
But when I set hparams_set=transformer_l4, it is solved. I do not know why

t2t-trainer \
    --data_dir=$DATA_DIR \
    --problems=translate_enzh_wmt8k \
    --model=transformer \
    --hparams_set=transformer_base_single_gpu \
    --output_dir=$TRAIN_DIR \
    --worker_gpu=1
2017-11-14 16:27:12.820019: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key training/body/encoder/layer_5/self_attention/multihead_attention/qkv_transform_single/kernel/Adam not found in checkpoint
2017-11-14 16:27:12.825524: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key training/body/encoder/layer_5/self_attention/multihead_attention/qkv_transform_single/bias/Adam not found in checkpoint
2017-11-14 16:27:12.826267: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key training/body/encoder/layer_5/self_attention/multihead_attention/qkv_transform_single/kernel/Adam_1 not found in checkpoint
2017-11-14 16:27:12.826268: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key training/body/encoder/layer_5/self_attention/multihead_attention/qkv_transform_single/bias/Adam_1 not found in checkpoint
2017-11-14 16:27:12.832678: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.838115: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.843404: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.865632: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.866170: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.872088: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.877564: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.900870: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.907357: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.907390: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.919163: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.919263: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.920182: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.921700: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
2017-11-14 16:27:12.922544: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]
Traceback (most recent call last):
  File "/home6/zy/miniconda2/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.2.7', 't2t-trainer')
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 750, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1524, in run_script
    exec(code, namespace, namespace)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensor2tensor-1.2.7-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 96, in <module>
    tf.app.run()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensor2tensor-1.2.7-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 92, in main
    schedule=FLAGS.schedule)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensor2tensor-1.2.7-py2.7.egg/tensor2tensor/utils/trainer_utils.py", line 378, in run
    hparams=hparams)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 218, in run
    return _execute_schedule(experiment, schedule)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule
    return task()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 625, in train_and_evaluate
    self.train(delay_secs=0)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 367, in train
    hooks=self._train_monitors + extra_hooks)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 807, in _call_train
    hooks=hooks)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 302, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 780, in _train_model
    log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 368, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 673, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 493, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 851, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 856, in _create_session
    return self._sess_creator.create_session()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 554, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 428, in create_session
    init_fn=self._scaffold.init_fn)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
    config=config)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
    saver.restore(sess, ckpt.model_checkpoint_path)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1666, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]

Caused by op u'save/RestoreV2_86', defined at:
  File "/home6/zy/miniconda2/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.2.7', 't2t-trainer')
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 750, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1524, in run_script
    exec(code, namespace, namespace)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensor2tensor-1.2.7-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 96, in <module>
    tf.app.run()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensor2tensor-1.2.7-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 92, in main
    schedule=FLAGS.schedule)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensor2tensor-1.2.7-py2.7.egg/tensor2tensor/utils/trainer_utils.py", line 378, in run
    hparams=hparams)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 218, in run
    return _execute_schedule(experiment, schedule)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 46, in _execute_schedule
    return task()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 625, in train_and_evaluate
    self.train(delay_secs=0)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 367, in train
    hooks=self._train_monitors + extra_hooks)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 807, in _call_train
    hooks=hooks)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 302, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 780, in _train_model
    log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 368, in MonitoredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 673, in __init__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 493, in __init__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 851, in __init__
    _WrappedSession.__init__(self, self._create_session())
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 856, in _create_session
    return self._sess_creator.create_session()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 554, in create_session
    self.tf_sess = self._session_creator.create_session()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 419, in create_session
    self._scaffold.finalize()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 212, in finalize
    self._saver.build()
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1227, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1263, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 745, in _build_internal
    restore_sequentially, reshape)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 470, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 267, in restore_op
    [spec.tensor.dtype])[0])
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home6/zy/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key body/decoder/layer_4/encdec_attention/layer_prepostprocess/layer_norm/layer_norm_bias not found in checkpoint
     [[Node: save/RestoreV2_86 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_86/tensor_names, save/RestoreV2_86/shape_and_slices)]]


question

Most helpful comment

When you use different hparam sets, you typically have to use a new training directory and will not be able to restore from old checkpoints (because things like hidden sizes, number of layers, etc. have changed).

All 5 comments

Hi :)

Does the problem still exist under tensor2tensor in version 1.2.9? What version of TensorFlow do you use for your experiment?

When you use different hparam sets, you typically have to use a new training directory and will not be able to restore from old checkpoints (because things like hidden sizes, number of layers, etc. have changed).

@stefan-it @rsepassi
Thanks.
The problem has been solved. As rsepassi says, I changed training directory and it worked. maybe because the program reload from old training directory.

I use:
tensor2tensor (1.2.7)
tensorflow-gpu (1.4.0)

@Jong-Won Have you ever changed your own dataset in your code ?

@liesun1994 Not yet

Was this page helpful?
0 / 5 - 0 ratings