Please answer the following questions for yourself before submitting an issue.
https://github.com/tensorflow/models/tree/master/research/object_detection/model_main.py
While training a custom object detector using TensorFlow Object Detection API on Colab I got this error. I was using tensorflow-gpu==1.15.0
and for fine tuning I was using ssd_mobilenet_v2_coco
. Following is the verbose along with the error I got:
WARNING:tensorflow:Forced number of epochs for all eval validations to be 1.
W0528 21:13:21.113062 140292083513216 model_lib.py:717] Forced number of epochs for all eval validations to be 1.
INFO:tensorflow:Maybe overwriting train_steps: 200000
I0528 21:13:21.113316 140292083513216 config_util.py:523] Maybe overwriting train_steps: 200000
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0528 21:13:21.113430 140292083513216 config_util.py:523] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: 1
I0528 21:13:21.113519 140292083513216 config_util.py:523] Maybe overwriting sample_1_of_n_eval_examples: 1
INFO:tensorflow:Maybe overwriting eval_num_epochs: 1
I0528 21:13:21.113614 140292083513216 config_util.py:523] Maybe overwriting eval_num_epochs: 1
INFO:tensorflow:Maybe overwriting load_pretrained: True
I0528 21:13:21.113696 140292083513216 config_util.py:523] Maybe overwriting load_pretrained: True
INFO:tensorflow:Ignoring config override key: load_pretrained
I0528 21:13:21.113776 140292083513216 config_util.py:533] Ignoring config override key: load_pretrained
WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
W0528 21:13:21.114626 140292083513216 model_lib.py:733] Expected number of evaluation epochs is 1, but instead encountered `eval_on_train_input_config.num_epochs` = 0. Overwriting `num_epochs` to 1.
INFO:tensorflow:create_estimator_and_inputs: use_tpu False, export_to_tpu False
I0528 21:13:21.114744 140292083513216 model_lib.py:768] create_estimator_and_inputs: use_tpu False, export_to_tpu False
INFO:tensorflow:Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f97ed4dd128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
I0528 21:13:21.115245 140292083513216 estimator.py:212] Using config: {'_model_dir': 'training/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
rewrite_options {
meta_optimizer_iterations: ONE
}
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f97ed4dd128>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f97d328dbf8>) includes params argument, but params are not passed to Estimator.
W0528 21:13:21.115487 140292083513216 model_fn.py:630] Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x7f97d328dbf8>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Not using Distribute Coordinator.
I0528 21:13:21.116259 140292083513216 estimator_training.py:186] Not using Distribute Coordinator.
INFO:tensorflow:Running training and evaluation locally (non-distributed).
I0528 21:13:21.116456 140292083513216 training.py:612] Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
I0528 21:13:21.116694 140292083513216 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 600.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0528 21:13:21.124795 140292083513216 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0528 21:13:21.162153 140292083513216 dataset_builder.py:84] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:101: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W0528 21:13:21.167545 140292083513216 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:101: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0528 21:13:21.167754 140292083513216 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
2020-05-28 21:13:22.910301: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-28 21:13:22.953259: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:13:22.953875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-05-28 21:13:22.960996: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-28 21:13:22.967688: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-28 21:13:22.977811: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-28 21:13:22.985131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-28 21:13:22.995549: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-28 21:13:23.004617: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-28 21:13:23.025234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-28 21:13:23.025382: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:13:23.026101: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:13:23.026693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
WARNING:tensorflow:From /content/models/research/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0528 21:13:33.109247 140292083513216 deprecation.py:323] From /content/models/research/object_detection/inputs.py:77: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0528 21:13:33.221111 140292083513216 deprecation.py:323] From /content/models/research/object_detection/utils/ops.py:493: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0528 21:13:39.145547 140292083513216 api.py:332] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/operators/control_flow.py:1004: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
`seed2` arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From /content/models/research/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0528 21:13:42.865469 140292083513216 deprecation.py:323] From /content/models/research/object_detection/inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
WARNING:tensorflow:From /content/models/research/object_detection/builders/dataset_builder.py:174: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
W0528 21:13:46.217640 140292083513216 deprecation.py:323] From /content/models/research/object_detection/builders/dataset_builder.py:174: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.batch(..., drop_remainder=True)`.
INFO:tensorflow:Calling model_fn.
I0528 21:13:46.233859 140292083513216 estimator.py:1148] Calling model_fn.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
W0528 21:13:46.430602 140292083513216 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.__call__` method instead.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.101978 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.133970 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.165436 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.343221 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.377842 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0528 21:13:49.414346 140292083513216 convolutional_box_predictor.py:156] depth of additional conv before box predictor: 0
W0528 21:13:49.456603 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_2_3x3_s2_512/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 256, 512]], model variable shape: [[3, 3, 256, 512]]. This variable will not be initialized from the checkpoint.
W0528 21:13:49.456816 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_3_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0528 21:13:49.456997 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_4_3x3_s2_256/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 128, 256]], model variable shape: [[3, 3, 128, 256]]. This variable will not be initialized from the checkpoint.
W0528 21:13:49.457174 140292083513216 variables_helper.py:161] Variable [FeatureExtractor/MobilenetV2/layer_19_2_Conv2d_5_3x3_s2_128/weights] is available in checkpoint, but has an incompatible shape with model variable. Checkpoint shape: [[1, 1, 64, 128]], model variable shape: [[3, 3, 64, 128]]. This variable will not be initialized from the checkpoint.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0528 21:13:54.449208 140292083513216 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/rmsprop.py:119: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
INFO:tensorflow:Done calling model_fn.
I0528 21:14:00.871218 140292083513216 estimator.py:1150] Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
I0528 21:14:00.872715 140292083513216 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
I0528 21:14:04.557027 140292083513216 monitored_session.py:240] Graph was finalized.
2020-05-28 21:14:04.557485: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-05-28 21:14:04.562729: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000165000 Hz
2020-05-28 21:14:04.563012: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1771800 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-28 21:14:04.563048: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-05-28 21:14:04.666903: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.667672: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1770d80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-28 21:14:04.667705: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-05-28 21:14:04.668018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.668594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2020-05-28 21:14:04.668682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-28 21:14:04.668724: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-05-28 21:14:04.668747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2020-05-28 21:14:04.668769: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2020-05-28 21:14:04.668796: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2020-05-28 21:14:04.668819: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2020-05-28 21:14:04.668842: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-28 21:14:04.668951: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.669555: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.670109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-05-28 21:14:04.670229: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2020-05-28 21:14:04.671546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-28 21:14:04.671575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-05-28 21:14:04.671585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-05-28 21:14:04.671747: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.672416: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-28 21:14:04.672994: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2020-05-28 21:14:04.673037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14221 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
INFO:tensorflow:Running local_init_op.
I0528 21:14:09.605103 140292083513216 session_manager.py:500] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0528 21:14:09.941666 140292083513216 session_manager.py:502] Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into training/model.ckpt.
I0528 21:14:18.960145 140292083513216 basic_session_run_hooks.py:606] Saving checkpoints for 0 into training/model.ckpt.
2020-05-28 21:14:36.916392: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 1074 of 2048
2020-05-28 21:14:46.905139: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:145] Filling up shuffle buffer (this may take a while): 2026 of 2048
2020-05-28 21:14:46.910085: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:195] Shuffle buffer filled.
2020-05-28 21:14:47.284742: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-28 21:14:53.420068: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
INFO:tensorflow:loss = 12.133639, step = 0
I0528 21:14:56.692664 140292083513216 basic_session_run_hooks.py:262] loss = 12.133639, step = 0
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: {{function_node __inference_Dataset_map_transform_and_pad_input_data_fn_3047}} assertion failed: [[0.748][0.758]] [[0.67][0.67]]
[[{{node Assert/AssertGuard/else/_123/Assert}}]]
[[IteratorGetNext]]
(1) Invalid argument: {{function_node __inference_Dataset_map_transform_and_pad_input_data_fn_3047}} assertion failed: [[0.748][0.758]] [[0.67][0.67]]
[[{{node Assert/AssertGuard/else/_123/Assert}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_8451]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/models/research/object_detection/model_main.py", line 114, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/content/models/research/object_detection/model_main.py", line 110, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 473, in train_and_evaluate
return executor.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 613, in run
return self.run_local()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/training.py", line 714, in run_local
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1195, in _train_model_default
saving_listeners)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1494, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1259, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [[0.748][0.758]] [[0.67][0.67]]
[[{{node Assert/AssertGuard/else/_123/Assert}}]]
[[IteratorGetNext]]
(1) Invalid argument: assertion failed: [[0.748][0.758]] [[0.67][0.67]]
[[{{node Assert/AssertGuard/else/_123/Assert}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_8451]]
0 successful operations.
0 derived errors ignored.
Training using model_main.py
file while using tensorflow-gpu==1.15.0
along with ssd_mobilenet_v2_coco
produces this.
EDIT: I have tried both tensorflow-gpu==1.15.0
(with pip installation) and version 1.15.2
(by specifying %tensorflow_version 1.x
Colab automatically installed version 1.15.2
). While working with both of them I got this error. I also encountered the no module found error for tf-slim
which I fixed by installing !pip install git+https://github.com/google-research/tf-slim
during my work. Finally, before I began training I executed model_builder_test.py
to make sure everything is okay. And model_builder_test.py
also didn't report any problem. But still I am getting this error.
I also asked the question on Stack Overflow
and there I got comments like: ".......I uninstalled tensorflow-gpu 1.15, and installed 1.14, and it started the training. Sometimes after steps 200, sometimes after steps 1900, I still get the same error". Here is the link.
A clear and concise description of what you expected to happen.
Include any logs that would be helpful to diagnose the problem.
I had the exact same error when building my own tfrecords to retrain my model. The issue was that the height of one of the labeled boxes was negative. I'd recommend checking the sanity of your data.
For maintainers, please throw an intelligible error message.
I had the exact same error when building my own tfrecords to retrain my model. The issue was that the height of one of the labeled boxes was negative. I'd recommend checking the sanity of your data.
Yes that worked! Thanks :)
Closing this as the issue seems to be resolved!
I'm facing the same issue but I have no negative bounding box value, please help (I'm using Indian Driving dataset)
Any help would be appreciated.
2020-09-29 00:06:26.393196: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-09-29 00:06:26.416859: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-09-29 00:07:06.289666: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-09-29 00:07:07.107673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.2415GHz coreCount: 3 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 14.92GiB/s
2020-09-29 00:07:07.134320: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-09-29 00:07:07.154031: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cublas64_10.dll'; dlerror: cublas64_10.dll not found
2020-09-29 00:07:07.172621: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2020-09-29 00:07:07.191094: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2020-09-29 00:07:07.210097: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2020-09-29 00:07:07.228896: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cusparse64_10.dll'; dlerror: cusparse64_10.dll not found
2020-09-29 00:07:07.248102: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2020-09-29 00:07:07.263071: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-09-29 00:07:07.504103: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-09-29 00:07:08.033543: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1fb829db420 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-29 00:07:08.047140: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-09-29 00:07:08.106845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-09-29 00:07:08.117479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
WARNING:tensorflow:There are non-GPU devices in
tf.distribute.Strategy, not using nccl allreduce.
W0929 00:07:08.332362 13388 cross_device_ops.py:1202] There are non-GPU devices in
tf.distribute.Strategy, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
I0929 00:07:08.335360 13388 mirrored_strategy.py:341] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0929 00:07:08.451359 13388 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0929 00:07:08.454360 13388 config_util.py:552] Maybe overwriting use_bfloat16: False
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0929 00:07:09.092362 13388 dataset_builder.py:83] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\builders\dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use
tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)instead. If sloppy execution is desired, use
tf.data.Options.experimental_deterministic.
W0929 00:07:09.190360 13388 deprecation.py:323] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\builders\dataset_builder.py:100: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use
tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)instead. If sloppy execution is desired, use
tf.data.Options.experimental_deterministic.
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\builders\dataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use
tf.data.Dataset.map()
W0929 00:07:10.169360 13388 deprecation.py:323] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\buildersdataset_builder.py:175: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.map()
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a
tf.sparse.SparseTensorand use
tf.sparse.to_denseinstead.
W0929 00:07:25.049363 13388 deprecation.py:323] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\util\dispatch.py:201: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a
tf.sparse.SparseTensorand use
tf.sparse.to_denseinstead.
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\util\dispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
seed2arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
W0929 00:07:32.807359 13388 deprecation.py:323] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\util\dispatch.py:201: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version.
Instructions for updating:
seed2arg is deprecated.Use sample_distorted_bounding_box_v2 instead.
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use
tf.castinstead.
W0929 00:07:38.073361 13388 deprecation.py:323] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\inputs.py:259: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use
tf.castinstead.
2020-09-29 00:07:57.104241: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 1552 of 2048
2020-09-29 00:08:04.758805: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:221] Shuffle buffer filled.
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\model_lib_v2.py:355: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the
trainingargument of the
__call__method of your layer or model.
W0929 00:08:14.518361 9788 deprecation.py:323] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\model_lib_v2.py:355: set_learning_phase (from tensorflow.python.keras.backend) is deprecated and will be removed after 2020-10-11.
Instructions for updating:
Simply pass a True/False value to the
trainingargument of the
__call__` method of your layer or model.
INFO:tensorflow:depth of additional conv before box predictor: 0
I0929 00:08:26.779362 9788 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0929 00:08:26.782360 9788 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0929 00:08:26.790361 9788 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0929 00:08:26.799363 9788 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0929 00:08:26.804361 9788 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
INFO:tensorflow:depth of additional conv before box predictor: 0
I0929 00:08:26.812362 9788 convolutional_keras_box_predictor.py:154] depth of additional conv before box predictor: 0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._groundtruth_lists
W0929 00:08:46.685360 13388 util.py:150] Unresolved object in checkpoint: (root).model._groundtruth_lists
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor
W0929 00:08:46.688360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._batched_prediction_tensor_names
W0929 00:08:46.695360 13388 util.py:150] Unresolved object in checkpoint: (root).model._batched_prediction_tensor_names
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads
W0929 00:08:46.699361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._sorted_head_names
W0929 00:08:46.706362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._sorted_head_names
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets
W0929 00:08:46.716362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings
W0929 00:08:46.718361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background
W0929 00:08:46.729364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.0
W0929 00:08:46.738360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.1
W0929 00:08:46.742367 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.1
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.2
W0929 00:08:46.752363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.2
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.3
W0929 00:08:46.759361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.3
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.4
W0929 00:08:46.763360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.4
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.5
W0929 00:08:46.770361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._shared_nets.5
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0
W0929 00:08:46.774361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1
W0929 00:08:46.785360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2
W0929 00:08:46.792360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3
W0929 00:08:46.797361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4
W0929 00:08:46.804361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5
W0929 00:08:46.815363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0
W0929 00:08:46.818361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1
W0929 00:08:46.826363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2
W0929 00:08:46.836364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3
W0929 00:08:46.838360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4
W0929 00:08:46.849363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5
W0929 00:08:46.858363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers
W0929 00:08:46.863362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers
W0929 00:08:46.870361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers
W0929 00:08:46.875362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers
W0929 00:08:46.883362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers
W0929 00:08:46.887367 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers
W0929 00:08:46.898360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers
W0929 00:08:46.905360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers
W0929 00:08:46.909362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers
W0929 00:08:46.917364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers
W0929 00:08:46.922361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers
W0929 00:08:46.932361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers
W0929 00:08:46.939363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0
W0929 00:08:46.949364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers.0
W0929 00:08:46.951361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers.0
W0929 00:08:46.955363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers.0
W0929 00:08:46.966364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers.0
W0929 00:08:46.974360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers.0
W0929 00:08:46.979362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0
W0929 00:08:46.986361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers.0
W0929 00:08:46.990360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers.0
W0929 00:08:47.001360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers.0
W0929 00:08:47.009361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers.0
W0929 00:08:47.013361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers.0
W0929 00:08:47.020360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers.0
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.kernel
W0929 00:08:47.025361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.bias
W0929 00:08:47.035360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.0._box_encoder_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers.0.kernel
W0929 00:08:47.043370 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers.0.bias
W0929 00:08:47.047362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.1._box_encoder_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers.0.kernel
W0929 00:08:47.054362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers.0.bias
W0929 00:08:47.064360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.2._box_encoder_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers.0.kernel
W0929 00:08:47.067361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers.0.bias
W0929 00:08:47.078364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.3._box_encoder_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers.0.kernel
W0929 00:08:47.087364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers.0.bias
W0929 00:08:47.091364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.4._box_encoder_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers.0.kernel
W0929 00:08:47.102362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers.0.bias
W0929 00:08:47.109365 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.box_encodings.5._box_encoder_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.kernel
W0929 00:08:47.119363 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.bias
W0929 00:08:47.123360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.0._class_predictor_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers.0.kernel
W0929 00:08:47.133362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers.0.bias
W0929 00:08:47.141364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.1._class_predictor_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers.0.kernel
W0929 00:08:47.145364 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers.0.bias
W0929 00:08:47.153362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.2._class_predictor_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers.0.kernel
W0929 00:08:47.157360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers.0.bias
W0929 00:08:47.167361 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.3._class_predictor_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers.0.kernel
W0929 00:08:47.175360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers.0.bias
W0929 00:08:47.186360 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.4._class_predictor_layers.0.bias
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers.0.kernel
W0929 00:08:47.190362 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers.0.kernel
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers.0.bias
W0929 00:08:47.200373 13388 util.py:150] Unresolved object in checkpoint: (root).model._box_predictor._prediction_heads.class_predictions_with_background.5._class_predictor_layers.0.bias
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0929 00:08:47.208362 13388 util.py:158] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
WARNING:tensorflow:From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\util\deprecation.py:574: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
W0929 00:08:57.474360 9004 deprecation.py:506] From C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\util\deprecation.py:574: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Use fn_output_signature instead
2020-09-29 00:09:20.459178: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 1496 of 2048
2020-09-29 00:09:24.076207: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:221] Shuffle buffer filled.
Traceback (most recent call last):
File "model_main_tf2.py", line 113, in
tf.compat.v1.app.run()
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 110, in main
record_summaries=FLAGS.record_summaries)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\object_detection\model_lib_v2.py", line 639, in train_loop
loss = _dist_train_step(train_input_iter)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
result = self._call(args, *kwds)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
return self._stateless_fn(args, *kwds)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eagerfunction.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eagerfunction.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eagerfunction.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eagerfunction.py", line 550, in call
ctx=ctx)
File "C:\Users\HP\AppData\Local\Programs\Python\Python37\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [[0.440740734][0.470370382][0.470370382]...] [[0.356481493][0.357407421][0.390740752]...]
[[{{node Assert/AssertGuard/else/_123/Assert/AssertGuard/Assert}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[IteratorGetNext]] [Op:__inference__dist_train_step_28368]
Function call stack:
_dist_train_step`
@jarnoux @hafiz031 I am facing the same issue as you faced and @jarnoux I even checked the export of CSV of my data to see if any value was negative but I didn't see any such values. Any other pointers regarding this issue?
Hi @SameedAtif @AliButtar ....Not only negative bounding box coordinates but also I removed all suspicious examples that could create problems and finally this worked! Apart from negatives I also checked if the given coordinates' values of boxes are larger than width or height of the image itself i.e, the bounding box is going outside of the image. If it is the case then I removed those example while creating csv files and therefore built tfrecords. That means no corrupted entries should be allowed to be included while making tfrecords. See my question: https://stackoverflow.com/q/62075321/6907424
Thanks, @hafiz031. Did you create your data in three steps that are
@AliButtar yes exactly these steps. No, I didn't notice such duplication issue, make sure your CSV file doesn't contain any duplicate names or if there are same images with different names in your data set.
@hafiz031
I saw your stackoverflow question a week ago, and I added this check to filter my dataset:
basically for every xml file, perform the checks that you mention, if the file corrupted, flag it corrupt and don't use it for training.
`
if (int(depth) !=3 ):
print("Incorrect image depth.")
flag = 1
for member in annotation.findall("object"):
xmin = member.find("bndbox").find("xmin")
ymin = member.find("bndbox").find("ymin")
xmax = member.find("bndbox").find("xmax")
ymax = member.find("bndbox").find("xmax")
if int(xmin.text) < 0 or int(xmin.text) > width:
flag = 1
if int(ymin.text) < 0 or int(ymin.text) > height:
flag = 1
if int(xmax.text) < 0 or int(xmax.text) > width:
flag = 1
if int(ymax.text) < 0 or int(ymax.text) > height:
flag = 1
if (flag == 0):
img_path = copyImage(image_path, path, filename + ".jpg")
string_path = str(img_path)
index = string_path.rfind('\\')
tree.write(str(string_path[:index+1])+filename+'.xml')
else:
print("Corrupted file!")
However I'm still getting
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [[0.440740734][0.470370382]]
[[{{node Assert/AssertGuard/else/_123/Assert/AssertGuard/Assert}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
`
Do note that the number of integer values reduced to two, thanks to your recommendation. Can you tell me if someway I can adjust this check.
@SameedAtif if I am not wrong you didn't scale the boxes' coordinates within [0,1]
until now. Did you do it later before making tfrecord
? As I know TensorFlow Object Detection API
expects the coordinates to be scaled like that. For example before passing these values to make tfrecord, xmin.text
should be scaled like xmin_scaled = int(xmin.text) / width
and now you can use xmin_scaled
to make tf.train.example
. And same goes with all of the coordinates. x
coordinates to be scaled with width
and y
coordinates to be scaled with height
. Again, personally I didn't remove those examples, instead, after scaling to [1, 0]
I made the coordinate value 0.0
if was < 0.0
and made also it 1.0
if it was > 1.0
. Although, I already removed examples which had bounding boxes outside of corresponding images and logically even after scaling the existing boxes should be inside of the images. But this error was still occurring probably due to some floating point precision issue regarding to the division calculations. So I added this additional check and the error was gone.
I used the following script to convert to tfrecord: https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#convert-xml-to-record
and it does convert it to [0,1]
Sanity check, what tensorflow and object detection version are you using?
@SameedAtif tensorflow==1.15.2
.
That might as well be the case, since I'm using tf 2.2.0
and object detection 2
. I'll try tf 1.15
with object detection 1
@SameedAtif check my updated comment https://github.com/tensorflow/models/issues/8595#issuecomment-711034844.
Can you clarify the additional check with some code snippet, you converted the integer bounding box values to floating point bounding box values to fix the issue?
Can you please mention the floating point precision issue check
that you added, and where exactly did you add it, I think that might be the case. Since I also tried making the coordinate value 0.0 if was < 0.0 and made also it 1.0 if it was > 1.0.
@SameedAtif alright, first of all I removed all examples where bounding box coordinates going outside of image and/or if the image is not valid. As these examples must be eliminated. Also don't forget to check width
and height
are not 0 or negative
themselves. Remove all of these examples from the CSV
file having this issue.
Than additionally, in second step (while creating tfrecord
), I performed these checks to handle floating point precision problem while creating each of the tf_example
s. After adding these checks to the function it was capable of fixing this problem if it occurs while scaling. Following is the code for this second step:
def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
########### ADDITIONAL CHECKS START HERE ###################
xmn = row['xmin'] / width
if xmn < 0.0:
xmn = 0.0
elif xmn > 1.0:
xmn = 1.0
xmins.append(xmn)
xmx = row['xmax'] / width
if xmx < 0.0:
xmx = 0.0
elif xmx > 1.0:
xmx = 1.0
xmaxs.append(xmx)
ymn = row['ymin'] / height
if ymn < 0.0:
ymn = 0.0
elif ymn > 1.0:
ymn = 1.0
ymins.append(ymn)
ymx = row['ymax'] / height
if ymx < 0.0:
ymx = 0.0
elif ymx > 1.0:
ymx = 1.0
ymaxs.append(ymx)
############ ADDITIONAL CHECKS END HERE ####################
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
@SameedAtif if you are wondering why it occurs, picture this situation: width = 5
, xmax = 5
, so after scaling xmax
should be 1.0
, but this doesn't happen generally. Instead it can be 1.00000000001
or 0.999999999
etc. This is a limitation. Because in many steps of calculations, values get truncated, approximated, rounded etc. Again, many calculations are not even performed natively with formulae, instead these are done in numerical method
s to reduce computational cost. Checkout this issue here: Floating Point Arithmetic: Issues and Limitations and Problem in comparing Floating point numbers and how to compare them correctly?
No, it didn't work for me, after first or second iteration it would give me a _dist_train_step
error. Thank you for your support, I guess I'll look for alternative ways to train a mobilenetssdv2 model.
Here is what I came up with after your suggestion, just in case I missed something.
`
""" Sample TensorFlow XML-to-TFRecord converter
usage: generate_tfrecord.py [-h] [-x XML_DIR] [-l LABELS_PATH] [-o OUTPUT_PATH] [-i IMAGE_DIR] [-c CSV_PATH]
optional arguments:
-h, --help show this help message and exit
-x XML_DIR, --xml_dir XML_DIR
Path to the folder where the input .xml files are stored.
-l LABELS_PATH, --labels_path LABELS_PATH
Path to the labels (.pbtxt) file.
-o OUTPUT_PATH, --output_path OUTPUT_PATH
Path of output TFRecord (.record) file.
-i IMAGE_DIR, --image_dir IMAGE_DIR
Path to the folder where the input image files are stored. Defaults to the same directory as XML_DIR.
-c CSV_PATH, --csv_path CSV_PATH
Path of output .csv file. If none provided, then no file will be written.
"""
import os
import glob
import pandas as pd
import io
import xml.etree.ElementTree as ET
import argparse
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # Suppress TensorFlow logging (1)
import tensorflow.compat.v1 as tf
from PIL import Image
from object_detection.utils import dataset_util, label_map_util
from collections import namedtuple
parser = argparse.ArgumentParser(
description="Sample TensorFlow XML-to-TFRecord converter")
parser.add_argument("-x",
"--xml_dir",
help="Path to the folder where the input .xml files are stored.",
type=str)
parser.add_argument("-l",
"--labels_path",
help="Path to the labels (.pbtxt) file.", type=str)
parser.add_argument("-o",
"--output_path",
help="Path of output TFRecord (.record) file.", type=str)
parser.add_argument("-i",
"--image_dir",
help="Path to the folder where the input image files are stored. "
"Defaults to the same directory as XML_DIR.",
type=str, default=None)
parser.add_argument("-c",
"--csv_path",
help="Path of output .csv file. If none provided, then no file will be "
"written.",
type=str, default=None)
args = parser.parse_args()
if args.image_dir is None:
args.image_dir = args.xml_dir
label_map = label_map_util.load_labelmap(args.labels_path)
label_map_dict = label_map_util.get_label_map_dict(label_map)
def xml_to_csv(path):
"""Iterates through all .xml files (generated by labelImg) in a given directory and combines
them in a single Pandas dataframe.
Parameters:
----------
path : str
The path containing the .xml files
Returns
-------
Pandas DataFrame
The produced dataframe
"""
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
if (float(member[5][0].text) < int(root.find('size')[0].text)) or (float(member[5][2].text) < int(root.find('size')[0].text)):
if (float(member[5][1].text) < int(root.find('size')[1].text)) or (float(member[5][3].text) < int(root.find('size')[1].text)):
if int(root.find('size')[0].text) > 0 or int(root.find('size')[1].text) > 0:
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
float(member[5][0].text),
float(member[5][1].text),
float(member[5][2].text),
float(member[5][3].text)
)
xml_list.append(value)
column_name = ['filename', 'width', 'height',
'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
return xml_df
def class_text_to_int(row_label):
return label_map_dict[row_label]
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
########### ADDITIONAL CHECKS START HERE ###################
xmn = row['xmin'] / width
if xmn < 0.0:
xmn = 0.0
elif xmn > 1.0:
xmn = 1.0
xmins.append(xmn)
xmx = row['xmax'] / width
if xmx < 0.0:
xmx = 0.0
elif xmx > 1.0:
xmx = 1.0
xmaxs.append(xmx)
ymn = row['ymin'] / height
if ymn < 0.0:
ymn = 0.0
elif ymn > 1.0:
ymn = 1.0
ymins.append(ymn)
ymx = row['ymax'] / height
if ymx < 0.0:
ymx = 0.0
elif ymx > 1.0:
ymx = 1.0
ymaxs.append(ymx)
############ ADDITIONAL CHECKS END HERE ####################
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
def main(_):
writer = tf.python_io.TFRecordWriter(args.output_path)
path = os.path.join(args.image_dir)
examples = xml_to_csv(args.xml_dir)
grouped = split(examples, 'filename')
for group in grouped:
tf_example = create_tf_example(group, path)
writer.write(tf_example.SerializeToString())
writer.close()
print('Successfully created the TFRecord file: {}'.format(args.output_path))
if args.csv_path is not None:
examples.to_csv(args.csv_path, index=None)
print('Successfully created the CSV file: {}'.format(args.csv_path))
if __name__ == '__main__':
tf.app.run()
`
@SameedAtif oh! Feeling sorry for you :( ...but as I think InvalidArgumentError
doesn't signify any single error. Instead, this error can occur for various reasons. When I was facing it I found some questions on Stack Overflow
with this error title. But the problem was not exactly the same internally. Hence, I needed to ask for a solution for mine specific one. But I am telling you one more corner case which might be a reason behind it. Did you draw bounding boxes yourself? If it is, then while drawing bounding boxes if you drag the mouse from left-top
to bottom-right
then the annotator software is considering the first point
i.e, the left-top
point as (xmin, ymin)
and the second point
i.e, bottom-right
point as the (xmax, ymax)
. It is completely okay as here xmax > xmin
and ymax > ymin
, but if you drag the mouse in the opposite direction then the annotator software might be assigning the right-bottom
point as (xmin, ymin)
and left-top
point as (xmax, ymax)
. But in this case xmax
is becoming less than xmin
and the same goes to y
coordinate. So make sure your annotator software is handling this case by swapping them. I didn't add this check because I was sure about having no such cases in my data-set and again, I am also not sure if it can create a problem as I didn't face it. But it is better to check it. Perhaps this might be the reason!
@hafiz031 Thanks for your analysis and answer. I also had the same issue. But only after reading your answer did I look into my xml parser and found out that I parsed wrongly. Your solution works perfectly for me. Cheeers
@senthurRam33 glad to hear :D
Also a min value of a bounding box is greater than the corresponding axis max value, this error raises. If this ambiguity's reason is not starting from top left corner when annotating, then you can easily solve this problem by getting mins and maxes of coordinates manually when you are creating your tfrecord. Do not forget to sanity check.
```
corrected_xmax = max(xmin, xmax)
corrected_xmin = min(xmin, xmax)
corrected_ymax = max(ymin, ymax)
corrected_ymin = min(ymin, ymax)
if (corrected_xmin > corrected_xmax or corrected_ymin > corrected_ymax):
print("BOUNDING BOX COORDINATE MIN COORDINATE GREATER THAN MAX COORDINATE:", full_path)
print(f"xmin: {corrected_xmin}, ymin: {corrected_ymin}, xmax: {corrected_xmax}, ymax: {corrected_ymax}")
raise
if (corrected_xmin<0 or corrected_ymin<0 or corrected_xmax<0
or corrected_ymax<0 or corrected_xmin>1 or corrected_ymin>1 or corrected_xmax>1 or corrected_ymax>1):
print("NORMALIZED PIXEL OUT OF BOUNDS FOR: ", full_path)
print(f"xmin: {corrected_xmin}, ymin: {corrected_ymin}, xmax: {corrected_xmax}, ymax: {corrected_ymax}")
raise```
Most helpful comment
I had the exact same error when building my own tfrecords to retrain my model. The issue was that the height of one of the labeled boxes was negative. I'd recommend checking the sanity of your data.