Yolov3: ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).

Created on 16 May 2019  路  7Comments  路  Source: ultralytics/yolov3

I start training process on my Red Hat Enterprise Linux Server 7.4, after hundreds of iterations, the error always occurred: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). What should I do to solve this problem? I need help.

bug

Most helpful comment

"--num-workers 0" will slow down training.
I fixed it by adding "--ipc=host" in my docker container configuration.

All 7 comments

Hello, thank you for your interest in our work! This is an automated response. Please note that most technical problems are due to:

  • Your changes to the default repository. If your issue is not reproducible in a fresh git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
sudo rm -rf yolov3  # remove exising repo
git clone https://github.com/ultralytics/yolov3 && cd yolov3 # git clone latest
python3 detect.py  # verify detection
python3 train.py  # verify training (a few batches only)
# CODE TO REPRODUCE YOUR ISSUE HERE
  • Your custom data. If your issue is not reproducible with COCO data we can not debug it. Visit our Custom Training Tutorial for exact details on how to format your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
  • Your environment. If your issue is not reproducible in a GCP Quickstart Guide VM we can not debug it. Ensure you meet the requirements specified in the README: Unix, MacOS, or Windows with Python >= 3.7, Pytorch >= 1.0, etc.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

hello @glenn-jocher , I have solved this error by set "--num-workers 0" guided by the doc of PyTorch torch.utils.data DataLoader, thanks

"--num-workers 0" will slow down training.
I fixed it by adding "--ipc=host" in my docker container configuration.

@mozpp yes, this is already the default usage in the dockerfile examples:

https://github.com/ultralytics/yolov3/blob/master/Dockerfile

I fixed it by following this comment
https://stackoverflow.com/a/59029085
Hope it help!

@mozpp

"--num-workers 0" will slow down training.
I fixed it by adding "--ipc=host" in my docker container configuration.

How do you add this to a Docker file?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

suarezjessie picture suarezjessie  路  5Comments

cyberclone12 picture cyberclone12  路  4Comments

JiahongXue picture JiahongXue  路  5Comments

Rajasekhar06 picture Rajasekhar06  路  3Comments

acburigo picture acburigo  路  4Comments