Detectron: [enforce fail at conv_pool_op_base.h:253] training my own keypoint dataset

Created on 6 Apr 2018  路  8Comments  路  Source: facebookresearch/Detectron

when I train my own keypoint datase, after 39280 iterations, errors appear as below:
RuntimeError: [enforce fail at conv_pool_op_base.h:253] input.size() > 0. Error from operator:
input: "gpu_1/_[pose]_roi_feat" input: "gpu_1/conv_fcn1_w" input: "gpu_1/conv_fcn1_b" output: "gpu_1/conv_fcn1" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 1 } engine: "CUDNN"

I really appreciate it if someone can help me solve this problem.

Most helpful comment

I am facing the same problem when training for rcnn for mask and keypoints.

As a temporary solution, changing line 66 in train.py works:

try:
    workspace.RunNet(model.net.Proto().name)
except:
    logger.warn("Error in iter {}".format(cur_iter))

@pppoe in case you managed to solve the problem by checking your annotated images, did you automatize it with a script? If yes, could you please post it here?

Thanks in advance

All 8 comments

@geyuying Did you solve this problem? I face the same error when I train the keypoint model after 3100 iterations. In addition, recovering from a snapshot will encounter the same error after another iteration. I guess the error comes from the data, but my dataset is COCO2014. Did you try to train on the COCO dataset?

Thank you!

same error here

When I train my own dataset with MaskRCNN(with ketpoints),I get the similiar problem,but it seems no one has solved this problem ?
E0605 17:11:07.014443 4777 net_dag.cc:188] Exception from operator chain starting at '' (type 'Concat'): caffe2::EnforceNotMet: [enforce fail at conv_pool_op_base.h:237] input.size() > 0. Error from operator:
input: "gpu_0/_[pose]_roi_feat" input: "gpu_0/conv_fcn1_w" input: "gpu_0/conv_fcn1_b" output: "gpu_0/conv_fcn1" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
I0605 17:11:07.014550 4776 context_gpu.cu:305] GPU 0: 5657 MB
I0605 17:11:07.014562 4776 context_gpu.cu:309] Total: 5657 MB
WARNING workspace.py: 185: Original python traceback for operator 283 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 190: File "tools/train_net.py", line 128, in
WARNING workspace.py: 190: File "tools/train_net.py", line 110, in main
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/utils/train.py", line 53, in train_model
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/utils/train.py", line 132, in create_model
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/optimizer.py", line 63, in _build_forward_graph
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/model_builder.py", line 217, in _single_gpu_build_func
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/model_builder.py", line 302, in _add_roi_keypoint_head
WARNING workspace.py: 190: File "/home/scau2/Downloads/Detectron-master/detectron/modeling/keypoint_rcnn_heads.py", line 214, in add_roi_pose_head_v1convX
WARNING workspace.py: 190: File "/home/scau2/anaconda2/lib/python2.7/site-packages/caffe2/python/cnn.py", line 169, in Relu
WARNING workspace.py: 190: File "/home/scau2/anaconda2/lib/python2.7/site-packages/caffe2/python/brew.py", line 106, in scope_wrapper
WARNING workspace.py: 190: File "/home/scau2/anaconda2/lib/python2.7/site-packages/caffe2/python/helpers/nonlinearity.py", line 36, in relu
Traceback (most recent call last):
File "tools/train_net.py", line 128, in
main()
File "tools/train_net.py", line 110, in main
checkpoints = detectron.utils.train.train_model()
File "/home/scau2/Downloads/Detectron-master/detectron/utils/train.py", line 65, in train_model
workspace.RunNet(model.net.Proto().name)
File "/home/scau2/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 217, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/scau2/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept
return func(args, *kwargs)
RuntimeError: [enforce fail at conv_pool_op_base.h:237] input.size() > 0. Error from operator:
input: "gpu_0/_[pose]_roi_feat" input: "gpu_0/conv_fcn1_w" input: "gpu_0/conv_fcn1_b" output: "gpu_0/conv_fcn1" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"

@rbgirshick can you give us some tips about this problem ,was it because the bug in caffe2 or the orgin maskrcnn does't support other keypoints dataset?I am getting struck here

I ran into this same issue and found the error is from training images with 0 valid objects. You may check your annotated images to see if there is any one with zero ground-truth boxes or key points

I am facing the same problem when training for rcnn for mask and keypoints.

As a temporary solution, changing line 66 in train.py works:

try:
    workspace.RunNet(model.net.Proto().name)
except:
    logger.warn("Error in iter {}".format(cur_iter))

@pppoe in case you managed to solve the problem by checking your annotated images, did you automatize it with a script? If yes, could you please post it here?

Thanks in advance

@pppoe, Thanks I had the same issue. I also found the bounding box coordinates must be confined to the image resolution (i.e. no negative coordinate values.) or an exception is thrown.

I ran into the same issue too when I was training e2e_keypoint_rcnn with IMS_PER_BATCH = 1.
Thanks @zamponotiropita for posting the temporary solution, but it may miss some training data when applying multi-gpu-training.
A better way to handle this is to modify the line 61-81 in keypoint_rcnn.py to the following codes.

if kp_fg_inds.shape[0] > 0:
    sampled_fg_rois = roidb['boxes'][kp_fg_inds]
    box_to_gt_ind_map = roidb['box_to_gt_ind_map'][kp_fg_inds]

    num_keypoints = gt_keypoints.shape[2]
    sampled_keypoints = -np.ones(
        (len(sampled_fg_rois), gt_keypoints.shape[1], num_keypoints),
        dtype=gt_keypoints.dtype
    )
    for ii in range(len(sampled_fg_rois)):
        ind = box_to_gt_ind_map[ii]
        if ind >= 0:
            sampled_keypoints[ii, :, :] = gt_keypoints[gt_inds[ind], :, :]
            assert np.sum(sampled_keypoints[ii, 2, :]) > 0

    heats, weights = keypoint_utils.keypoints_to_heatmap_labels(
        sampled_keypoints, sampled_fg_rois, M=cfg.KRCNN.HEATMAP_SIZE
    )

    shape = (sampled_fg_rois.shape[0] * cfg.KRCNN.NUM_KEYPOINTS, 1)
    heats = heats.reshape(shape)
    weights = weights.reshape(shape)

else:# If there are no fg keypoint rois (it does happen)
    # The network cannot handle empty blobs, so we must provide a heatmap
    # We simply take the first bg roi, given it an all zero heatmap, and
    # set its weights to zero (ignore label).
    roi_inds = np.where(roidb['gt_classes'] == 0)[0]
    # sampled_fg_rois is actually one random roi, it's fine.
    sampled_fg_rois = roidb['boxes'][roi_inds[0]].reshape((1, -1))
    # We give it an 0's blob 
    heats = blob_utils.zeros((1 * cfg.KRCNN.NUM_KEYPOINTS, 1))
    # We set weights to 0, so the loss won't consider this.
    weights = blob_utils.zeros((1 * cfg.KRCNN.NUM_KEYPOINTS, 1))
Was this page helpful?
0 / 5 - 0 ratings

Related issues

realwecan picture realwecan  路  3Comments

kampelmuehler picture kampelmuehler  路  4Comments

junxiaoge picture junxiaoge  路  3Comments

gaopeng-eugene picture gaopeng-eugene  路  4Comments

743341 picture 743341  路  4Comments