Faster-rcnn.pytorch: No module named cython_bbox

Created on 28 Dec 2017 · 37Comments · Source: jwyang/faster-rcnn.pytorch

Hi,
I'm trying your code but when I run:

CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py --dataset pascal_voc --net vgg16 --cuda --bs $BATCH_SIZE

for the training I have this error:

from model.utils.cython_bbox import bbox_overlaps
ImportError: No module named cython_bbox

I've just installed cython doing:
sudo pip install cython

Thanks in advance

Source

capuzz

Most helpful comment

did you cd lib directory and run the command sh make.sh?

pranerd on 28 Dec 2017

👍5

All 37 comments

did you cd lib directory and run the command sh make.sh?

pranerd on 28 Dec 2017

👍5

Yes of course but it doesn't work.

capuzz on 28 Dec 2017

Hi, @Capuz93 ,

after compiling, can you find "cython_bbox.so" in folder lib/model/utils?

jwyang on 28 Dec 2017

Hi @jwyang .
No there isn't ""cython_bbox.so". I checked that when I run the command "sh make.sh" I have this error at the beginning:

Traceback (most recent call last):
File "setup.py", line 59, in
CUDA = locate_cuda()
File "setup.py", line 54, in locate_cuda
raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v))
EnvironmentError: The CUDA lib64 path could not be located in /usr/lib64

After this error it seems compile in the right way.
Maybe this is the reason. How can I fix it?

capuzz on 28 Dec 2017

Hi, @Capuz93 ,

it seems that setup.py did not find your cuda. where did you install your cuda?

jwyang on 28 Dec 2017

@Capuz93

I have updated setup.py by commenting unused lines. Update it and try again. If you have CUDA installed on your machine, it should work.

jwyang on 28 Dec 2017

@jwyang Thanks very much.
Now it works.

capuzz on 28 Dec 2017

Does this code work on TitanX GPU only? Because I have NVIDIA GEFORCE but when I try to train the model I have this error:

cuda runtime error (38) : no CUDA-capable device is detected

N.B.: If you prefer I could open a new Issue and close this one.

capuzz on 28 Dec 2017

It should not. I tried on TitanXp, Titan X and 980Tias well. What is your
exact command for training?

On Thu, Dec 28, 2017 at 12:00 Capuz93 notifications@github.com wrote:

Does this code work on TitanX GPU only? Because I have NVIDIA GEFORCE but
when I try to train the model I have this error:

cuda runtime error (38) : no CUDA-capable device is detected

N.B.: If you prefer I could open a new Issue and close this one.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/jwyang/faster-rcnn.pytorch/issues/9#issuecomment-354320342,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADtr5ztSOM8A_y-qFbn3nrGu7BlBGrKDks5tE8kxgaJpZM4ROMy0
.

jwyang on 28 Dec 2017

I forgot to set GPU_ID.
Then I've run this command:

CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset pascal_voc --net v
gg16 --cuda --bs 1

But now I have an other error:

RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /pytorch/torch/lib/THC/THCTensorCopy.cu:204

capuzz on 28 Dec 2017

it seems a pytorch error. Did you ever run cuda training of other code successfully? Which line of code trigger this error?

jwyang on 28 Dec 2017

@jwyang Yes. I expected to find out an error like "out of memory" because I don't have enough memory on my computer, but this error is strange.
The complete error is this:

/faster-rcnn.pytorch-master/lib/model/rpn/rpn.py:66: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCTensorCopy.cu line=204 error=48 : no kernel image is available for execution on the device
Traceback (most recent call last):
File "trainval_net.py", line 316, in
_, cls_prob, bbox_pred, rpn_loss, rcnn_loss = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(input, *kwargs)
File "/faster-rcnn.pytorch-master/lib/model/faster_rcnn/faster_rcnn_cascade.py", line 51, in forward
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(input, *kwargs)
File "/faster-rcnn.pytorch-master/lib/model/rpn/rpn.py", line 76, in forward
im_info, cfg_key))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(input, *kwargs)
File "/faster-rcnn.pytorch-master/lib/model/rpn/proposal_layer.py", line 148, in forward
keep_idx_i = keep_idx_i.long().view(-1)
File "/usr/local/lib/python2.7/dist-packages/torch/tensor.py", line 51, in long
return self.type(type(self).__module__ + '.LongTensor')
File "/usr/local/lib/python2.7/dist-packages/torch/cuda/__init__.py", line 370, in type
return super(_CudaBase, self).type(args, *kwargs)
File "/usr/local/lib/python2.7/dist-packages/torch/_utils.py", line 38, in _type
return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (48) : no kernel image is available for execution on the device at /pytorch/torch/lib/THC/THCTensorCopy.cu:204

capuzz on 28 Dec 2017

hi, @Capuz93 , could you set a break point at line 148 to see whether there are something wrong with keep_idx_i? This error is weird.

jwyang on 29 Dec 2017

Hi @jwyang , I tried to print keep_idx_id and I have this output:

 0
 1
 2
⋮
11997
11998
11999
[torch.cuda.IntTensor of size 12000x1 (GPU 0)]

Do yuo have any idea about this error?

capuzz on 29 Dec 2017

Hi, @Capuz93 , this looks good. Did you try debug step by step?

jwyang on 30 Dec 2017

Sorry for my delay but I was busy in these days.

Which version of CUDA do you use? Because maybe the error is due to the cuda version.

capuzz on 7 Jan 2018

I am using CUDA 8.0. Pytorch 0.2.0. I also tried Pytorch 0.3.0. I should have posted these requirements on the readme.

jwyang on 7 Jan 2018

Ok because I'm using CUDA 9.0 and maybe the error is due to this.
I'll try with CUDA 8.0.

Thanks

capuzz on 7 Jan 2018

I tried also with CUDA 8.0 but it doesn't work. I have the same error (no kernel image is available for execution on the device).
Do you have any idea about this issue?

capuzz on 8 Jan 2018

Could you reinstall your pytorch? And recompile all the libs.

On Sun, Jan 7, 2018 at 19:13 Capuz93 notifications@github.com wrote:

I tried also with CUDA 8.0 but it doesn't work. I have the same error (no
kernel image is available for execution on the device).
Do you have any idea about this issue?

—
You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub
https://github.com/jwyang/faster-rcnn.pytorch/issues/9#issuecomment-355864784,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADtr5w6lmaCenpekzgB7qDFe9OzplJn5ks5tIV2ygaJpZM4ROMy0
.

jwyang on 8 Jan 2018

I reinstalled my pytorch and recompiled all the libs and I had this error:

invalid device function

After some search I understood that the problem could be the value of -arch in the make; so I tried to change it from sm_52 to sm_20 (I'm not sure is the correct value for my GPU), but now I have this new error:

/faster-rcnn.pytorch-master/lib/model/rpn/rpn.py:66: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
rpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape)
Traceback (most recent call last):
File "trainval_net.py", line 316, in
_, cls_prob, bbox_pred, rpn_loss, rcnn_loss = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(input, kwargs)
File "/faster-rcnn.pytorch-master/lib/model/faster_rcnn/faster_rcnn_cascade.py", line 51, in forward
rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(input, *kwargs)
File "/faster-rcnn.pytorch-master/lib/model/rpn/rpn.py", line 85, in forward
rpn_data = self.RPN_anchor_target((rpn_cls_score.data, gt_boxes, im_info, num_boxes))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 325, in __call__
result = self.forward(input, *kwargs)
*File "/faster-rcnn.pytorch-master/lib/model/rpn/anchor_target_layer.py", line 149, in forward
positive_weights = 1.0 / num_examples
ZeroDivisionError: float division by zero

capuzz on 8 Jan 2018

Hi, @Capuz93 ,

This error is also wired. I did not encounter this issue ever. It indicates that all region proposals have small overlaps to the ground truth rows. One possible reason is that there are something wrong with the ground-truth bounding box. Did you check the data loaded is correct?

jwyang on 8 Jan 2018

I had that error because I had changed RPN_BATCHSIZE and BATCH_SIZE values from 256 to 1 from the file "faster-rcnn.pytorch-master/cfgs/vgg16.yml because I have memory problem.
If I set RPN_BATCHSIZE to 256 and BATCH_SIZE to 1 or 2 I have this error:

ValueError: result of slicing is an empty tensor

If I leave 256 for both RPN_BATCHSIZE and BATCH_SIZE I have a memory error (out of memory).
How can I change that values for my memory problem?

capuzz on 8 Jan 2018

I see, I think you can change 256 to 64 for both batch size. I will reproduce this error by setting the batch size as yours to find a solution to this issue.

jwyang on 8 Jan 2018

I have out of memory error if I set batch size bigger than 2.
I'll wait for your suggestions.
Thanks

capuzz on 9 Jan 2018

ok, I will work on that. However, if you GPU cannot hold batch size even bigger than 2, I think it is hard for you to train a good detection model actually, :), what kind of GPU are you using?

jwyang on 9 Jan 2018

I know, in fact I told you I'm trying your model to understand faster rcnn better and then run this model on a more performing machine.
Now I'm using NVIDIA GEFORCE 920M and actually I'm not sure about the -arch value in make.sh in lib folder. How can I know which value should I set for my GPU? You have -arch=sm_52; could you explain me why?

capuzz on 9 Jan 2018

this would be a good guide:

http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

I will add these information to readme.

jwyang on 9 Jan 2018

Hi, @Capuz93, I have updated the make.sh in lib folder. You can try to recompile the cuda libraries and it is supposed to work well directly.

I will check what will happen if batch size is very small, e.g., 1 or 2

jwyang on 10 Jan 2018

Ok thanks.
I've downloaded the new make file but I have the same error:

ValueError: result of slicing is an empty tensor

I'll wait for your news.

capuzz on 10 Jan 2018

yeah, I have not yet solved this problem, please give me some time.

jwyang on 10 Jan 2018

Hi, @Capuz93 , I have modified proposal_target_layer_cascade.py to adapt to very tine batch training. Now you can set the batch size to 2. Have a try

jwyang on 10 Jan 2018

I see now that when I run the new make file with:
sh make.sh
I have this error:

nvcc fatal : Unsupported gpu architecture 'sm_60'

but after this error the program continues compiling.
I show you the output of "sh make.sh" in the file attached.
output.txt

However I tried with the old make file setting -arch value to sm_52 according to the link you sent me yesterday but without success because I had this error:

cuda runtime error: invalid device function

capuzz on 10 Jan 2018

you might need to modify file make.sh, you should remove sm_60.

jwyang on 10 Jan 2018

@Capuz93 , here, https://github.com/jwyang/faster-rcnn.pytorch/blob/6572adb56c90c38dc5b0199c5ee13e54a001a801/lib/make.sh#L8

jwyang on 10 Jan 2018

I changed the make file and I have a warning but I think it's ok:
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).

Then when I run the program setting BATCH_SIZE=2 and RPN_BATCHSIZE=2 I have "out of memory" error; when I run the program setting BATCH_SIZE=1 and RPN_BATCHSIZE=2 I have this error:

File "/home/Scrivania/TESI/FASTER_RCNN/2/faster-rcnn.pytorch-master/lib/model/rpn/proposal_target_layer_cascade.py", line 168, in _sample_rois_pytorch
rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
RuntimeError: the given numpy array has zero-sized dimensions. Zero-sized dimensions are not supported in PyTorch

capuzz on 11 Jan 2018

The minimal batch size should be 2. If you want make it work, you can
reduce the image size, and use resnet instead of vgg16.

On Wed, Jan 10, 2018 at 18:35 Capuz93 notifications@github.com wrote:

I changed the make file and I have a warning but I think it's ok:
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are
deprecated, and may be removed in a future release (Use
-Wno-deprecated-gpu-targets to suppress warning).

Then when I run the program setting BATCH_SIZE=2 and RPN_BATCHSIZE=2 I
have "out of memory" error; when I run the program setting BATCH_SIZE=1 and
RPN_BATCHSIZE=2 I have this error:

File
"/home/Scrivania/TESI/FASTER_RCNN/2/faster-rcnn.pytorch-master/lib/model/rpn/proposal_target_layer_cascade.py",
line 168, in _sample_rois_pytorch
rand_num = torch.from_numpy(rand_num).type_as(gt_boxes).long()
RuntimeError: the given numpy array has zero-sized dimensions. Zero-sized
dimensions are not supported in PyTorch

—
You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub
https://github.com/jwyang/faster-rcnn.pytorch/issues/9#issuecomment-356773953,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADtr55iF19TvZeRQ79TvRDxr6zCHI2W0ks5tJUk-gaJpZM4ROMy0
.