Dear All,
I am following instruction on this page (https://github.com/tensorflow/models/blob/master/object_detection/g3doc/running_pets.md) to train COCO-Model on my own dataset.
Here is steps that I performed.
**1. I generate TF Support Record files for Training and Validation. This only contained my dataset set only one class.
train_input_reader: {
tf_record_input_reader {
input_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_casescovers_train.record"
}
label_map_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_label_map2.pbtxt"
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_casescovers_val.record"
}
label_map_path: "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/data/faster_rcnn_resnet101_coco_11_06_2017/mscoco_label_map2.pbtxt"
}
3. Then I run this code to train the model:
python train.py --logtostderr --pipeline_config_path=samples/config/faster_rcnn_resnet101_pets.config --train_dir=data/faster_rcnn_resnet101_coco_11_06_2017
It gave me below error:
INFO:tensorflow:Error reported to Coordinator:
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]
Caused by op u'save_1/RestoreV2_587', defined at:
File "train.py", line 200, in
tf.app.run()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 196, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/trainer.py", line 275, in train
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
self.build()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1170, in build
restore_sequentially=self._restore_sequentially)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
[spec.tensor.dtype])[0])
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 640, in restore_v2
dtypes=dtypes, name=name)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for data/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]
Traceback (most recent call last):
File "train.py", line 200, in
tf.app.run()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 196, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/trainer.py", line 290, in train
saver=saver)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 732, in train
master, start_standard_services=False, config=session_config) as sess:
File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 964, in managed_session
self.stop(close_summary_writer=close_summary_writer)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 792, in stop
stop_grace_period_secs=self._stop_grace_secs)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 953, in managed_session
start_standard_services=start_standard_services)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/supervisor.py", line 708, in prepare_or_wait_for_session
init_feed_dict=self._init_feed_dict, init_fn=self._init_fn)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session
config=config)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/session_manager.py", line 205, in _restore_checkpoint
saver.restore(sess, ckpt.model_checkpoint_path)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for data/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]
Caused by op u'save_1/RestoreV2_587', defined at:
File "train.py", line 200, in
tf.app.run()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 196, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/humayun/MD_Stuff/tensorflow_1.2/models/models/object_detection/trainer.py", line 275, in train
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 9, in __init__
self.build()
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 0, in build
restore_sequentially=self._restore_sequentially)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line , in build
restore_sequentially, reshape)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line , in _AddRestoreOps
tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line , in restore_op
[spec.tensor.dtype])[0])
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line , in restore_v2
dtypes=dtypes, name=name)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library., line 767, in apply_op
op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/humayun/MD_Stuff/tensorflow_1.2/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1, in __init__
self._traceback = _extract_stack()
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for a/faster_rcnn_resnet101_coco_11_06_2017/model.ckpt-0
[[Node: save_1/RestoreV2_587 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_e_1/Const_0_0, save_1/RestoreV2_587/tensor_names, save_1/RestoreV2_587/shape_and_slices)]]
ERROR:tensorflow:==================================
Object was never used (type
If you want to mark it as used call its "mark_used()" method.
It was originally created here:
I don't really know to be honest, but I wonder if you are confusing something by setting the train_dir in your command line to be the same directory as the fine_tune_checkpoint that you are using? Maybe try setting the train_dir to be some other directory?
Thanks JCH1 for suggestion.
I tried it but still same error.
I also tried to train the COCO-Pretrained Model on PETS dataset as explained in running_pets example but on my local machine instead of Google Cloud. It still giving me same error. Any suggestion how can I retrain any object detection model available here with PETS or my own dataset on my local machine.
Thank you.
-Humayun
Hi,
I figure out where is the error. If I gave same path to model and train_dir then I actual read same model and when starting in same location the trained model and gave this error. I just change the path of train_dir to a different location or new folder, there is no error. Now my model is running and training on my own dataset.
Thank you.
-Humayun
Thanks. Your suggestion work very well. I change the train_dir and it is
working perfectly. Thank you.
best regards,
-humayun
Best Regards,
HUMAYUN IRSHAD
Machine Learning Scientist,
Flipkart
455A Portage Ave.
Palo Alto CA
94306-2213
Cell No. +1 857 225 4227
On Fri, Jun 30, 2017 at 11:14 AM, Jonathan Huang notifications@github.com
wrote:
I don't really know to be honest, but I wonder if you are confusing
something by setting the train_dir in your command line to be the same
directory as the fine_tune_checkpoint that you are using? Maybe try setting
the train_dir to be some other directory?—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/models/issues/1828#issuecomment-312337372,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAtTw-ciIwFXOyYd20LpHZxcTolam_tWks5sJTrogaJpZM4OK4Zh
.
Thanks, @jch1!
Most helpful comment
Hi,
I figure out where is the error. If I gave same path to model and train_dir then I actual read same model and when starting in same location the trained model and gave this error. I just change the path of train_dir to a different location or new folder, there is no error. Now my model is running and training on my own dataset.
Thank you.
-Humayun