Models: Exception has occurred: tensorflow.python.framework.errors_impl.NotFoundError

Created on 24 Jan 2019 · 5Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using:C:\tf_od_api\mask_rcnn_restnet50
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): windows 7x64
TensorFlow installed from (source or binary): binary pip install
TensorFlow version (use command below):1.12
Bazel version (if compiling from source):
CUDA/cuDNN version:CUDA9.0 cudnn 7.3.1
GPU model and memory:nvidia GTX1060 6GB
Exact command to reproduce:
python object_detection/legacy/train.py --train_dir=C:\tf_od_api\mask_rcnn_restnet50 --pipeline_config_path=C:\tf_od_api\mask_rcnn_restnet50\mask_rcnn_resnet50_atrous_coco.config
You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

I created the TFrecords from here
1 class and 1010 png images and Mask R-CNN with Resnet-50 (v1), Atrous version model from here config from here
I modified the path and the tfrecord name in config and the image type,
when I used the command above, the errors showed up:

Exception has occurred: tensorflow.python.framework.errors_impl.NotFoundError
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: Key Conv/biases/Momentum not found in checkpoint [[node save/RestoreV2 (defined at C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\legacy\trainer.py:377) = RestoreV2dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Caused by op 'save/RestoreV2', defined at: File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\ptvsd_launcher.py", line 45, in main(ptvsdArgs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd__main__.py", line 265, in main wait=args.wait) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd__main__.py", line 258, in handle_args debug_main(addr, name, kind, extra, *kwargs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py", line 45, in debug_main run_file(address, name, extra, *kwargs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py", line 79, in run_file run(argv, addr, *kwargs) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_local.py", line 140, in _run _pydevd.main() File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py", line 1925, in main debugger.connect(host, port) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py", line 1283, in run return self._exec(is_module, entry_point_fn, module_name, file, globals, locals) File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd\pydevd.py", line 1290, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "c:\Users\willy_sung.vscode\extensions\ms-python.python-2018.12.1\pythonFiles\lib\python\ptvsd_vendored\pydevd_pydev_imps_pydev_execfile.py", line 25, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "c:\models\research\object_detection\legacy\train.py", line 184, in tf.app.run() File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run _sys.exit(main(argv)) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func return func(args, *kwargs) File "c:\models\research\object_detection\legacy\train.py", line 180, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\legacy\trainer.py", line 377, in train keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 1102, in __init__ self.build() File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 1114, in build self._build(self._filename, build_save=True, build_restore=True) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 1151, in _build build_save=build_save, build_restore=build_restore) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 795, in _build_internal restore_sequentially, reshape) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 406, in _AddRestoreOps restore_sequentially) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\training\saver.py", line 862, in bulk_restore return io_ops.restore_v2(filename_tensor, names, slices, dtypes) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1550, in restore_v2 shape_and_slices=shape_and_slices, dtypes=dtypes, name=name) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func return func(args, **kwargs) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op op_def=op_def) File "C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__ self._traceback = tf_stack.extract_stack() NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error: Key Conv/biases/Momentum not found in checkpoint [[node save/RestoreV2 (defined at C:\Users\willy_sung\AppData\Local\Continuum\anaconda3\envs\venv\lib\site-packages\object_detection-0.1-py3.6.egg\object_detection\legacy\trainer.py:377) = RestoreV2dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

the path of the data and model is in the image
tfodapiissue
It seems the model is not right to the config file,
but there is the same error when I try the maskrcnninceptionv2 model too.
can anyone help me how to solve this problem?

Source

lunasdejavu

Most helpful comment

Try deleting the checkpoint file that resides in the path your have specified at train_dir=... in your command

emasoumi on 22 Apr 2019

👍3

All 5 comments

all the local variables when the error happened in maskrcnninceptionv2 :
cluster:None
cluster_data:None
configs:{'eval_config': num_examples: 8000
m...evals: 10
, 'eval_input_config': label_map_path: "C:...PNG_MASKS
, 'eval_input_configs': [label_map_path: "C:...NG_MASKS
], 'model': faster_rcnn {
numb...ht: 4.0
}
, 'train_config': batch_size: 1
data_a...etection"
, 'train_input_config': label_map_path: "C:...PNG_MASKS
}
create_input_dict_fn:functools.partial(.get_next at 0x000000002FC548C8>, label_map_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_label_map.pbtxt"
load_instance_masks: true
tf_record_input_reader {
input_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_train.record"
}
mask_type: PNG_MASKS
)
env:{}
get_next:.get_next at 0x000000002FC548C8>
graph_rewriter_fn:None
input_config:label_map_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_label_map.pbtxt"
load_instance_masks: true
tf_record_input_reader {
input_path: "C:\tf_od_api\mask_rcnn_inceptionnetv2\rainbow_train.record"
}
is_chief:True
master:''
model_config:faster_rcnn {
number_of_stages: 3
num_classes: 1
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 1024
max_dimension: 1024
}
}
feature_extractor {
type: "faster_rcnn_inception_v2"
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
height_stride: 16
width_stride: 16
scales: 0.25
scales: 0.5
scales: 1.0
scales: 2.0
aspect_ratios: 0.5
aspect_ratios: 1.0
aspect_ratios: 2.0
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.009999999776482582
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079071
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
use_dropout: false
dropout_keep_probability: 1.0
conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.009999999776482582
}
}
}
predict_instance_masks: true
mask_prediction_conv_depth: 0
mask_height: 15
mask_width: 15
mask_prediction_num_conv_layers: 2
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_mask_prediction_loss_weight: 4.0
}
model_fn:functools.partial(, model_config=faster_rcnn {
number_of_stages: 3
num_classes: 1
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 1024
max_dimension: 1024
}
}
feature_extractor {
type: "faster_rcnn_inception_v2"
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
height_stride: 16
width_stride: 16
scales: 0.25
scales: 0.5
scales: 1.0
scales: 2.0
aspect_ratios: 0.5
aspect_ratios: 1.0
aspect_ratios: 2.0
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.009999999776482582
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079071
first_stage_max_proposals: 300
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
use_dropout: false
dropout_keep_probability: 1.0
conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.009999999776482582
}
}
}
predict_instance_masks: true
mask_prediction_conv_depth: 0
mask_height: 15
mask_width: 15
mask_prediction_num_conv_layers: 2
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.6000000238418579
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
second_stage_mask_prediction_loss_weight: 4.0
}
, is_training=True)
ps_tasks:0
task:0
task_data:{'index': 0, 'type': 'master'}
task_info:
train_config:batch_size: 1
data_augmentation_options {
random_horizontal_flip {
}
}
optimizer {
momentum_optimizer {
learning_rate {
manual_step_learning_rate {
initial_learning_rate: 0.00019999999494757503
schedule {
step: 900000
learning_rate: 1.9999999494757503e-05
}
schedule {
step: 1200000
learning_rate: 1.9999999949504854e-06
}
}
}
momentum_optimizer_value: 0.8999999761581421
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
fine_tune_checkpoint: "C:\tf_od_api\mask_rcnn_inceptionnetv2\model.ckpt"
from_detection_checkpoint: true
num_steps: 200000
fine_tune_checkpoint_type: "detection"
worker_job_name:'lonely_worker'
worker_replicas:1
_:['c:\models\researc...train.py']
__exception__: (, NotFoundError(), )

lunasdejavu on 24 Jan 2019

all the files for the training are in the link, can someone give me a hand?

lunasdejavu on 25 Jan 2019

Try deleting the checkpoint file that resides in the path your have specified at train_dir=... in your command

emasoumi on 22 Apr 2019

👍3

@emasoumi Legend!

fuzzyBatman on 19 May 2019

Hi There,
We are checking to see if you still need help on this, as this seems to be an old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.