Models: Export Object detection model (V2) fails on assertion "assert_existing_objects_matched"

Created on 24 Jul 2020  路  10Comments  路  Source: tensorflow/models

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [+] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
  • [+] I am reporting the issue to the correct repository. (Model Garden official or research directory)
  • [+] I checked to make sure that this issue has not been filed already.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection

2. Describe the bug

I'm having the OOM issue on the big models, so I tried to train a dummy model,

faster_rcnn {
    num_classes: 9
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 80
        max_dimension: 100
        pad_to_max_dimension: true
      }
    }
...
train_config: {
  batch_size: 1
  num_steps: 200
....

for a some reason resulting checkpoint files are very small
ckpt-1.index = 247 bytes
ckpt-1.data-00000-of-00001 = 864 bytes

Export of this dummy model fails with assertion

raise AssertionError(
("Some Python objects were not bound to checkpointed values, likely due to changes in the Python program: %s") %
(list(unused_python_objects),))

3. Steps to reproduce

Train dummy faster_rcnn model without finetune checkpoint

4. Expected behavior

final checkpoint file has to be ~100 Mb like in v1. Export happens without errors

5. Additional context

6. System information

  • Linux Ubuntu 18
  • TensorFlow installed from binary
  • TensorFlow version v2.2.0-rc4-8-g2b96f3662b 2.2.0
  • Python version 3.8
  • CUDA/cuDNN version: Driver Version: 440.100 CUDA Version: 10.2
  • GPU model and memory: GeForce GTX 960M / 2004MiB
research bug

Most helpful comment

I get the same error with any model. Attached (TF2 Error.txt) is the terminal output with
ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 (also tried bunch of other models including efficientdet_d0_coco17_tpu-32)
Here's what I run:

python model_main_tf2.py \
  --pipeline_config_path=training/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.config \
  --model_dir=training/ \
  --alsologtostderr

I get the AssertionError, followed by tons of warnings related to weight loading.

AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:

The platform I use:

  • tensorflow_gpu==2.2
  • Ubuntu 18.04
  • CUDA Version: 10.2
  • Python 3.8 (also tried 3.7, same issue)
  • GeForce GTX 1060 with driver version 440.33.01 (Also tried on AWS-EC2 with TESLA, same error)

I've tried on a brand new machine with fresh installation as well, the issue is persistant.

All 10 comments

@veonua

Request you to share complete code snippet or steps to reproduce the issue in our environment.It helps us in localizing the issue faster.Thanks!

@ravikyram https://gist.github.com/veonua/e4186c92df80b49ad3d813f1219d0727

I'm using latest master version of object detection API

object_detection/model_main_tf2.py --model_dir=./output --pipeline_config_path=checkpoint/pipeline.config --num_train_steps=1000

object_detection/exporter_main_v2.py --input_type=image_tensor --trained_checkpoint_dir="./output" --output_directory="./model" --pipeline_config_path=checkpoint/pipeline.config

please let me know if you need any more information

I get the same error with any model. Attached (TF2 Error.txt) is the terminal output with
ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03 (also tried bunch of other models including efficientdet_d0_coco17_tpu-32)
Here's what I run:

python model_main_tf2.py \
  --pipeline_config_path=training/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03.config \
  --model_dir=training/ \
  --alsologtostderr

I get the AssertionError, followed by tons of warnings related to weight loading.

AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:

The platform I use:

  • tensorflow_gpu==2.2
  • Ubuntu 18.04
  • CUDA Version: 10.2
  • Python 3.8 (also tried 3.7, same issue)
  • GeForce GTX 1060 with driver version 440.33.01 (Also tried on AWS-EC2 with TESLA, same error)

I've tried on a brand new machine with fresh installation as well, the issue is persistant.

Same Error:

AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:

same error. Bump. It happens to me when loading CenterNet

as the temporary solution, I've removed

status.assert_existing_objects_matched()

The model seems to be working.

same error. It happens to me when loading CenterNet_ResNet50_v1

Any updates on this? I'm getting the same error with every TF2 model I've tried.

Likely a bug in TF 2.3.0

Change the line in pipeline.config

fine_tune_checkpoint_type: "classification" to fine_tune_checkpoint_type: "detection"

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rakashi picture rakashi  路  3Comments

kamal4493 picture kamal4493  路  3Comments

Mostafaghelich picture Mostafaghelich  路  3Comments

trungdn picture trungdn  路  3Comments

chenyuZha picture chenyuZha  路  3Comments