Models: Error Training COCO-pretrained Model on my own dataset.

Created on 30 Jun 2017 · 5Comments · Source: tensorflow/models

Dear All,

I am following instruction on this page (https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_pets.md) to train COCO-Model on my own dataset.
Here is steps that I performed.

**1. I generate TF Support Record files for Training and Validation. This only contained my dataset set only one class.

I modified models/object_detection/samples/configs/faster_rcnn_resnet101_pets.config to change number of classes (initially 90, I changed it to 91 because I am training this model only for one class, that is not exist in actual MSCOCO dataset). I also changed the PATH_TO_BE_CONFIGURED to my data paths as:**
fine_tune_checkpoint: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt"

train_input_reader: {
tf_record_input_reader {
input_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_casescovers_train.record"
}
label_map_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_label_map2.pbtxt"
}

eval_input_reader: {
tf_record_input_reader {
input_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_casescovers_val.record"
}
label_map_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_label_map2.pbtxt"
}

3. Then I run this code to train the model:
python train.py --logtostderr --pipeline_config_path=samples/config/faster_rcnn_resnet101_pets.config --train_dir=data/faster_rcnn_resnet101_coco_11_06_2017
It gave me below error:

INFO:tensorflow:Error reported to Coordinator: , Unsuccessful TensorSliceReader constructor: Failed to find any matching files for data/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]

Caused by op u'save_1/RestoreV2_587', defined at:
File "train.py", line 200, in
tf.app.run()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 196, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/trainer.py", line 275, in train
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
self.build()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1170, in build
restore_sequentially=self._restore_sequentially)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 640, in restore_v2
dtypes=dtypes, name=name)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for data/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]

Traceback (most recent call last):
File "train.py", line 200, in
tf.app.run()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 196, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/trainer.py", line 290, in train
saver=saver)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 732, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for data/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]

Caused by op u'save_1/RestoreV2_587', defined at:
File "train.py", line 200, in
tf.app.run()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 196, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/trainer.py", line 275, in train
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 9, in __init__
self.build()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 0, in build
restore_sequentially=self._restore_sequentially)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line , in build
restore_sequentially, reshape)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line , in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line , in restore_op
[spec.tensor.dtype])[0])
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line , in restore_v2
dtypes=dtypes, name=name)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library., line 767, in apply_op
op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1, in __init__
self._traceback = _extract_stack()

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for a/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_e_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]

ERROR:tensorflow:==================================
Object was never used (type ):

If you want to mark it as used call its "mark_used()" method.
It was originally created here:

['File "train.py", line 200, in \n tf.app.run()', 'File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2site-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))',ile "train.py", line 196, in main\n worker_job_name, is_chief, FLAGS.train_dir)', 'File "/home/humayun/MD_Stuff/tensorfl1.2/models/models/object_detection/trainer.py", line 290, in train\n saver=saver)', 'File "/home/humayun/MD_Stuff/tensorw_1.2/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 655, in train\n ready_op f_variables.report_uninitialized_variables()', 'File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packagtensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(args, *kwargs))', 'e "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 1 in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/home/humayun/MD_Stuff/tensorflow_1.2/locaib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 96, in init\n stack = [s.strip() for s in ceback.format_stack()]']

Source

humayun

Most helpful comment

Hi,
I figure out where is the error. If I gave same path to model and train_dir then I actual read same model and when starting in same location the trained model and gave this error. I just change the path of train_dir to a different location or new folder, there is no error. Now my model is running and training on my own dataset.
Thank you.
-Humayun

humayun on 2 Jul 2017

👍2

All 5 comments

I don't really know to be honest, but I wonder if you are confusing something by setting the train_dir in your command line to be the same directory as the fine_tune_checkpoint that you are using? Maybe try setting the train_dir to be some other directory?

jch1 on 30 Jun 2017

👍2

Thanks JCH1 for suggestion.
I tried it but still same error.
I also tried to train the COCO-Pretrained Model on PETS dataset as explained in running_pets example but on my local machine instead of Google Cloud. It still giving me same error. Any suggestion how can I retrain any object detection model available here with PETS or my own dataset on my local machine.

Thank you.
-Humayun

humayun on 30 Jun 2017

humayun on 2 Jul 2017

👍2

Thanks. Your suggestion work very well. I change the train_dir and it is
working perfectly. Thank you.
best regards,
-humayun

Best Regards,

HUMAYUN IRSHAD

Machine Learning Scientist,

Flipkart

455A Portage Ave.

Palo Alto CA

94306-2213

Cell No. +1 857 225 4227

On Fri, Jun 30, 2017 at 11:14 AM, Jonathan Huang notifications@github.com
wrote:

I don't really know to be honest, but I wonder if you are confusing
something by setting the train_dir in your command line to be the same
directory as the fine_tune_checkpoint that you are using? Maybe try setting
the train_dir to be some other directory?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1828#issuecomment-312337372,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAtTw-ciIwFXOyYd20LpHZxcTolam_tWks5sJTrogaJpZM4OK4Zh
.