I am trying to train the Context R-CNN model. Below is my progress so far.
Seems some steps only work using tf1 while others only using tf2.
Any suggestions or experiences are highly appreciated.
https://github.com/tensorflow/models/tree/master/research/object_detection
During initiating model training of the Context R-CNN I get the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Name: , Feature list 'image/encoded' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
python object_detection/model_main.py \
--alsologtostderr \
--pipeline_config_path object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti.config \
--model_dir ../../data/context_rcnn \
--num_train_steps 10 \
--sample_1_of_n_eval_examples 1
I'm following the Context R-CNN documentation
Setup
I'm using docker locally.
Generating and saving contextual features for each image
Note I'm using the available test data from object_detection/test_images/snapshot_serengeti
python object_detection/dataset_tools/context_rcnn/create_cococameratraps_tfexample_main.py \
--alsologtostderr \
--output_tfrecord_prefix object_detection/test_images/snapshot_serengeti/train \
--image_directory object_detection/test_images/snapshot_serengeti/ \
--input_annotations_file object_detection/test_images/snapshot_serengeti/context_rcnn_demo_metadata.json
Skip Generating weakly-supervised bounding box labels for image-labeled data
Generating and saving contextual features for each image
Downloaded a detection model from the tf1 model zoo: faster_rcnn_resnet101_coco.
I'm aware COCO model does not fit Snapshot-Serengeti labels in context_rcnn_demo_metadata.json, but I'm just trying to get it to run.
Add output_final_box_features: true to 'pipeline.config'
Export model with detection features:
python object_detection/export_inference_graph.py \
--alsologtostderr \
--input_type tf_example \
--pipeline_config_path ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/pipeline.config \
--trained_checkpoint_prefix ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt \
--output_directory ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/export/ \
--additional_output_tensor_names detection_features
Generating and saving contextual features for each image
python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \
--alsologtostderr \
--embedding_input_tfrecord object_detection/test_images/snapshot_serengeti/train* \
--embedding_output_tfrecord object_detection/test_images/snapshot_serengeti/embedding \
--embedding_model_dir ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/export/saved_model/
The command above fails using tf1 with the following error: TypeError: load() missing 2 required positional arguments: 'tags' and 'export_dir' [while running 'ExtractEmbedding']
However using the tf2 docker image does the job.
Building up contextual memory banks and storing them for each context group
python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
--input_tfrecord object_detection/test_images/snapshot_serengeti/embedding* \
--output_tfrecord object_detection/test_images/snapshot_serengeti/bank-minute \
--time_horizon minute
Resulting in: object_detection/test_images/snapshot_serengeti/bank-minute-00000-of-00001
Training a Context R-CNN Model
Switch back to tf1
edit object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti.config
train_input_reader {
label_map_path: "object_detection/data/snapshot_serengeti_label_map.pbtxt"
tf_record_input_reader {
input_path: "object_detection/test_images/snapshot_serengeti/bank-minute-00000-of-00001"
}
load_context_features: true
input_type: TF_SEQUENCE_EXAMPLE
}
python object_detection/model_main.py \
--alsologtostderr \
--pipeline_config_path object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti.config \
--model_dir ../../data/context_rcnn \
--num_train_steps 10 \
--sample_1_of_n_eval_examples 1
md5-3e00fdf62ac6e992335d43ca8ba582bf
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Name: , Feature list 'image/encoded' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
md5-01f9dd0b5680aece2143f68f589e341f
INFO:tensorflow:Done calling model_fn.
I0825 11:04:03.191481 140581246424896 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0825 11:04:03.192392 140581246424896 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0825 11:04:05.956825 140581246424896 monitored_session.py:240] Graph was finalized.
INFO:tensorflow:Restoring parameters from ../../data/snapshot_serengeti/context_rcnn/model.ckpt-0
I0825 11:04:06.073695 140581246424896 saver.py:1284] Restoring parameters from ../../data/snapshot_serengeti/context_rcnn/model.ckpt-0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W0825 11:04:08.022697 140581246424896 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I0825 11:04:08.861182 140581246424896 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0825 11:04:09.187242 140581246424896 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ../../data/snapshot_serengeti/context_rcnn/model.ckpt.
I0825 11:04:19.261078 140581246424896 basic_session_run_hooks.py:606] Saving checkpoints for 0 into ../../data/snapshot_serengeti/context_rcnn/model.ckpt.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: {{function_node Dataset_map_TfSequenceExampleDecoder.decode_57}} Name: , Feature list 'image/encoded' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
[[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
[[IteratorGetNext]]
[[Loss/RPNLoss/BalancedPositiveNegativeSampler_7/ScatterNd/_8067]]
(1) Invalid argument: {{function_node Dataset_map_TfSequenceExampleDecoder.decode_57}} Name: , Feature list 'image/encoded' is required but could not be found. Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
[[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
[[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.
I'm encountering the exact same issue.
Any news on this?
Okay, after digging in a bit I could successfully start a training run.
I added the option --output_type tf_example to the call of add_context_to_examples.py to make it output standard tf.train.Examples instead of example sequences into the output tfrecord. I then also removed input_type: TF_SEQUENCE_EXAMPLE from the pipeline configuration file.
Training then started but gave me CUDA out-of-memory errors. Supposedly this was due to batch_size: 64 in the pipeline configuration file so I changed that to batch_size: 1 and training worked.
@jagob Hope that helps you too!
i followed exacly your steps @reedts but at the ending of training i got this: "ValueError: Please make sure context_features and valid_context_size are in the features"
Most helpful comment
I'm encountering the exact same issue.
Any news on this?