I am trying to train the ssd_inception_v2
the training break with the error
the result error:
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1917,1]
when start,I get the message bellow:
Total memory: 11.90GiB
Free memory: 11.75GiB
Ignoring device specification /device:GPU:0 for node 'prefetch_queue_Dequeue' because the input edge from 'prefetch_queue' is a reference connection and already has a device field set to /device:CPU:0
I wonder what the message mean,I use nvidia-smi to see the gpu GPU-Util is not full,sometimes 0%.and if the batch size>12,run out with OOM.if the batch size<12,then it's fine
Which thing should I fix to get out of the error?
This is related to issue #1390 #2038
@jch1, could you take a look?
I have the same problem
What dataset are you using? Could you post your configs as well?
@derekjchow I am using my own dataset ,with about 25000 images in train.record.per image is no more than 900k,the train.record is 3.2G for all,here is my config:
model {
ssd {
num_classes: 26
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
}
}
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
}
}
similarity_calculator {
iou_similarity {
}
}
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
reduce_boxes_in_lowest_layer: true
}
}
image_resizer {
fixed_shape_resizer {
height: 500
width: 500
}
}
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 3
box_code_size: 4
apply_sigmoid_to_scores: false
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
}
}
}
feature_extractor {
type: 'ssd_inception_v2'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
}
}
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
}
}
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.9997,
epsilon: 0.001,
}
}
}
loss {
classification_loss {
weighted_sigmoid {
anchorwise_output: true
}
}
localization_loss {
weighted_smooth_l1 {
anchorwise_output: true
}
}
hard_example_miner {
num_hard_examples: 3000
iou_threshold: 0.99
loss_type: CLASSIFICATION
max_negatives_per_positive: 3
min_negatives_per_image: 0
}
classification_weight: 1.0
localization_weight: 1.0
}
normalize_loss_by_num_matches: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SIGMOID
}
}
}
train_config: {
batch_size: 8
optimizer {
rms_prop_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.004
schedule {
step: 0
learning_rate: 0.004
}
schedule {
step: 50000
learning_rate: 0.001
}
schedule {
step: 200000
learning_rate: 0.0004
}
schedule {
step: 300000
learning_rate: 0.00004
}
schedule {
step: 500000
learning_rate: 0.00001
}
schedule {
step: 800000
learning_rate: 0.000001
}
}
momentum_optimizer_value: 0.9
decay: 0.9
epsilon: 1.0
}
}
fine_tune_checkpoint: "/mnt/raid4/home/zhangyi/modelsmaster/modelsmaster/object_detection/premodel/ssd_inception_v2_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
ssd_random_crop {
}
}
}
train_input_reader: {
tf_record_input_reader {
input_path: "/data/public/image/data7.24/train.record"
}
label_map_path: "/mnt/raid4/home/zhangyi/modelsmaster/modelsmaster/object_detection/data/data26_label_map.pbtxt"
}
eval_config: {
num_examples: 1778
}
eval_input_reader: {
tf_record_input_reader {
input_path: "/data/public/image/data7.24/test1.record"
}
label_map_path: "/mnt/raid4/home/zhangyi/modelsmaster/modelsmaster/object_detection/data/data26_label_map_eng.pbtxt"
shuffle: false
num_readers: 1
}
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): VERSION="16.04.2 LTS (Xenial Xerus)"
TensorFlow version (use command below): 1.1.0
GPU memory: 11.7G
Is there anything wrong to make this problem?
What size are the images in your dataset? We find that datasets with very large images (1920x1080) tend to hit OOM. Prescaling the images in the dataset to smaller resolutions can help.
@derekjchow Thanks a lot,I have resized the images and the batch size can up to 32 now.
I am facing same similar issue. Any help please?
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[18816,20000]
[[Node: logits/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Reshape, logits/weights/read)]]
[[Node: Adam/update/_962 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_5661_Adam/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Though I tried setting the Allocator type option as explained here. But no luck
config = tf.ConfigProto()
config.gpu_options.allocator_type = 'BFC'
@jollysean could you describe your dataset? As mentioned, we recommend pre-shrinking very large images.
I am training bookcorpus using skip-thought vectors model. For this i reduced the dimensionality to suite my needs.
Hi,
I face this issue when running evaluation (eval.py) on GPU. The training runs fine on another thread. Getting this issue when running the eval script. The dataset has images in order 1300 x 1300.
Running on Titan X Nvidia
@shresthamalik it could be that the train.py script captures all the memory, even though it doesn't use all of it. There was a solution around here , like this (in train.py):
def main(_):
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8) # leave for eval
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
I have GTX 1060 6GB
I resized all my images to 300x300
in config:
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
and it still works only with batch_size value 1
higher values than 1 (2, 8, 12, 24, 32) cause OOM errors
even with batch_size value 1 after some steps (~100) it fails retraining ssd mobilenet:
INFO:tensorflow:global step 171: loss = 14.8309 (0.360 sec/step)
I1109 12:00:57.197321 7740 tf_logging.py:115] global step 171: loss = 14.8309 (0.360 sec/step)
INFO:tensorflow:global step 172: loss = 11.7885 (0.351 sec/step)
I1109 12:00:57.549896 7740 tf_logging.py:115] global step 172: loss = 11.7885 (0.351 sec/step)
INFO:tensorflow:global step 173: loss = 12.5532 (0.369 sec/step)
I1109 12:00:57.919557 7740 tf_logging.py:115] global step 173: loss = 12.5532 (0.369 sec/step)
INFO:tensorflow:global step 174: loss = 13.3306 (0.328 sec/step)
I1109 12:00:58.248665 7740 tf_logging.py:115] global step 174: loss = 13.3306 (0.328 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [2,1917] vs. [3,1]
[[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'Loss/Match/cond/mul_4', defined at:
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 290, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "C:\tensorflow1\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses
losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss
keypoints, weights)
File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets
groundtruth_weights_list)
File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets
anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 182, in assign
valid_rows=tf.greater(groundtruth_weights, 0))
File "C:\tensorflow1\models\research\object_detection\core\matcher.py", line 241, in match
return Match(self._match(similarity_matrix, valid_rows),
File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match
_match_when_rows_are_non_empty, _match_when_rows_are_empty)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2086, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1930, in BuildCondBranch
original_result = fn()
File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 866, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 1131, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5358, in mul
"Mul", x=x, y=y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
[[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
I1109 12:00:58.279880 7740 tf_logging.py:115] Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, Incompatible shapes: [2,1917] vs. [3,1]
[[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'Loss/Match/cond/mul_4', defined at:
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 290, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "C:\tensorflow1\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses
losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss
keypoints, weights)
File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets
groundtruth_weights_list)
File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets
anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 182, in assign
valid_rows=tf.greater(groundtruth_weights, 0))
File "C:\tensorflow1\models\research\object_detection\core\matcher.py", line 241, in match
return Match(self._match(similarity_matrix, valid_rows),
File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match
_match_when_rows_are_non_empty, _match_when_rows_are_empty)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2086, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1930, in BuildCondBranch
original_result = fn()
File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 866, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 1131, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5358, in mul
"Mul", x=x, y=y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
[[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Traceback (most recent call last):
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2,1917] vs. [3,1]
[[{{node Loss/Match/cond/mul_4}} = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 415, in train
saver=saver)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 770, in train
sess, train_op, global_step, train_step_kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 487, in train_step
run_metadata=run_metadata)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [2,1917] vs. [3,1]
[[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'Loss/Match/cond/mul_4', defined at:
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 306, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 290, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "C:\tensorflow1\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "C:\tensorflow1\models\research\object_detection\legacy\trainer.py", line 205, in _create_losses
losses_dict = detection_model.loss(prediction_dict, true_image_shapes)
File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 680, in loss
keypoints, weights)
File "C:\tensorflow1\models\research\object_detection\meta_architectures\ssd_meta_arch.py", line 853, in _assign_targets
groundtruth_weights_list)
File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 483, in batch_assign_targets
anchors, gt_boxes, gt_class_targets, unmatched_class_label, gt_weights)
File "C:\tensorflow1\models\research\object_detection\core\target_assigner.py", line 182, in assign
valid_rows=tf.greater(groundtruth_weights, 0))
File "C:\tensorflow1\models\research\object_detection\core\matcher.py", line 241, in match
return Match(self._match(similarity_matrix, valid_rows),
File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 194, in _match
_match_when_rows_are_non_empty, _match_when_rows_are_empty)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 2086, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\control_flow_ops.py", line 1930, in BuildCondBranch
original_result = fn()
File "C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py", line 175, in _match_when_rows_are_non_empty
tf.cast(tf.expand_dims(valid_rows, axis=-1), dtype=tf.float32))
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 866, in binary_op_wrapper
return func(x, y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\math_ops.py", line 1131, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 5358, in mul
"Mul", x=x, y=y, name=name)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "C:\Users\Admin\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [2,1917] vs. [3,1]
[[node Loss/Match/cond/mul_4 (defined at C:\tensorflow1\models\research\object_detection\matchers\argmax_matcher.py:175) = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Loss/Match/cond/one_hot, Loss/Match/cond/Cast_2)]]
[[{{node gradients/FeatureExtractor/MobilenetV1/Conv2d_13_pointwise_2_Conv2d_5_3x3_s2_128/BatchNorm/FusedBatchNorm_grad/FusedBatchNormGrad/_1497}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2718_...chNormGrad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
I have GTX 1060 6GB
I resized all my images to 300x300
in config:image_resizer { fixed_shape_resizer { height: 300 width: 300 } }and it still works only with
batch_sizevalue 1
higher values than 1 (2, 8, 12, 24, 32) cause OOM errors
Hi. Did you solve your OOM errors?