Models: 'image/encoded' is required but could not be found, during training Context R-CNN

Created on 25 Aug 2020 · 3Comments · Source: tensorflow/models

I am trying to train the Context R-CNN model. Below is my progress so far.
Seems some steps only work using tf1 while others only using tf2.

Any suggestions or experiences are highly appreciated.

Prerequisites

[ ] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[x] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

During initiating model training of the Context R-CNN I get the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Name: , Feature list 'image/encoded' is required but could not be found.  Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?

python object_detection/model_main.py \
--alsologtostderr \
--pipeline_config_path object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti.config \
--model_dir ../../data/context_rcnn \
--num_train_steps 10 \
--sample_1_of_n_eval_examples 1

3. Steps to reproduce

I'm following the Context R-CNN documentation

Setup
I'm using docker locally.

Generating and saving contextual features for each image
Note I'm using the available test data from object_detection/test_images/snapshot_serengeti

python object_detection/dataset_tools/context_rcnn/create_cococameratraps_tfexample_main.py \
--alsologtostderr \
--output_tfrecord_prefix object_detection/test_images/snapshot_serengeti/train \
--image_directory object_detection/test_images/snapshot_serengeti/ \
--input_annotations_file object_detection/test_images/snapshot_serengeti/context_rcnn_demo_metadata.json

Skip Generating weakly-supervised bounding box labels for image-labeled data

Generating and saving contextual features for each image

Downloaded a detection model from the tf1 model zoo: faster_rcnn_resnet101_coco.
I'm aware COCO model does not fit Snapshot-Serengeti labels in context_rcnn_demo_metadata.json, but I'm just trying to get it to run.
Add output_final_box_features: true to 'pipeline.config'
Export model with detection features:

python object_detection/export_inference_graph.py \
--alsologtostderr \
--input_type tf_example \
--pipeline_config_path ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/pipeline.config \
--trained_checkpoint_prefix ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt \
--output_directory ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/export/ \
--additional_output_tensor_names detection_features

Generating and saving contextual features for each image

python object_detection/dataset_tools/context_rcnn/generate_embedding_data.py \
--alsologtostderr \
--embedding_input_tfrecord object_detection/test_images/snapshot_serengeti/train* \
--embedding_output_tfrecord object_detection/test_images/snapshot_serengeti/embedding \
--embedding_model_dir ../../data/zoo/tf1/faster_rcnn_resnet101_coco_2018_01_28/export/saved_model/

The command above fails using tf1 with the following error: TypeError: load() missing 2 required positional arguments: 'tags' and 'export_dir' [while running 'ExtractEmbedding']
However using the tf2 docker image does the job.

Building up contextual memory banks and storing them for each context group

python object_detection/dataset_tools/context_rcnn/add_context_to_examples.py \
--input_tfrecord object_detection/test_images/snapshot_serengeti/embedding* \
--output_tfrecord object_detection/test_images/snapshot_serengeti/bank-minute \
--time_horizon minute

Resulting in: object_detection/test_images/snapshot_serengeti/bank-minute-00000-of-00001

Training a Context R-CNN Model

Switch back to tf1
edit object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti.config

train_input_reader {
  label_map_path: "object_detection/data/snapshot_serengeti_label_map.pbtxt"
  tf_record_input_reader {
    input_path: "object_detection/test_images/snapshot_serengeti/bank-minute-00000-of-00001"
  }
  load_context_features: true
  input_type: TF_SEQUENCE_EXAMPLE
}

python object_detection/model_main.py \
--alsologtostderr \
--pipeline_config_path object_detection/samples/configs/context_rcnn_resnet101_snapshot_serengeti.config \
--model_dir ../../data/context_rcnn \
--num_train_steps 10 \
--sample_1_of_n_eval_examples 1



md5-3e00fdf62ac6e992335d43ca8ba582bf



tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Name: , Feature list 'image/encoded' is required but could not be found.  Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?



md5-01f9dd0b5680aece2143f68f589e341f



INFO:tensorflow:Done calling model_fn.
I0825 11:04:03.191481 140581246424896 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0825 11:04:03.192392 140581246424896 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0825 11:04:05.956825 140581246424896 monitored_session.py:240] Graph was finalized.
INFO:tensorflow:Restoring parameters from ../../data/snapshot_serengeti/context_rcnn/model.ckpt-0
I0825 11:04:06.073695 140581246424896 saver.py:1284] Restoring parameters from ../../data/snapshot_serengeti/context_rcnn/model.ckpt-0
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
W0825 11:04:08.022697 140581246424896 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/saver.py:1069: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
I0825 11:04:08.861182 140581246424896 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0825 11:04:09.187242 140581246424896 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ../../data/snapshot_serengeti/context_rcnn/model.ckpt.
I0825 11:04:19.261078 140581246424896 basic_session_run_hooks.py:606] Saving checkpoints for 0 into ../../data/snapshot_serengeti/context_rcnn/model.ckpt.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node Dataset_map_TfSequenceExampleDecoder.decode_57}} Name: , Feature list 'image/encoded' is required but could not be found.  Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
         [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
         [[IteratorGetNext]]
         [[Loss/RPNLoss/BalancedPositiveNegativeSampler_7/ScatterNd/_8067]]
  (1) Invalid argument: {{function_node Dataset_map_TfSequenceExampleDecoder.decode_57}} Name: , Feature list 'image/encoded' is required but could not be found.  Did you mean to include it in feature_list_dense_missing_assumed_empty or feature_list_dense_defaults?
         [[{{node ParseSingleSequenceExample/ParseSingleSequenceExample}}]]
         [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.

6. System information

OS Platform and Distribution: Linux Ubuntu 16.04
GPU model and memory: Nvidia 2080TI 11GB

research bug

Source

jagob

👍2

Most helpful comment

I'm encountering the exact same issue.

Any news on this?

reedts on 2 Sep 2020

👍3

All 3 comments

I'm encountering the exact same issue.

Any news on this?

reedts on 2 Sep 2020

👍3

Okay, after digging in a bit I could successfully start a training run.

I added the option --output_type tf_example to the call of add_context_to_examples.py to make it output standard tf.train.Examples instead of example sequences into the output tfrecord. I then also removed input_type: TF_SEQUENCE_EXAMPLE from the pipeline configuration file.

Training then started but gave me CUDA out-of-memory errors. Supposedly this was due to batch_size: 64 in the pipeline configuration file so I changed that to batch_size: 1 and training worked.

@jagob Hope that helps you too!

reedts on 3 Sep 2020

i followed exacly your steps @reedts but at the ending of training i got this: "ValueError: Please make sure context_features and valid_context_size are in the features"