Mask_rcnn: OSError: can't allocate memory

Created on 13 Jul 2018 · 13Comments · Source: matterport/Mask_RCNN

I have one memory issue when I trained the shapes model of samples, however, I expanded the memory, reduced the max_queue_size, IMAGE_PER_GPU, numbers of images, everything I can think of might be the problem. Still not work.

Here is the error message:

````
OSError Traceback (most recent call last)
in ()
6 learning_rate=config.LEARNING_RATE,
7 epochs=1,
----> 8 layers='heads')

~/RCNN/MRCNN/Mask_RCNN/mrcnn/model.py in train(self, train_dataset, val_dataset, learning_rate, epochs, layers, augmentation)
2350 max_queue_size=100,
2351 workers=workers,
-> 2352 use_multiprocessing=True,
2353 )
2354 self.epoch = max(self.epoch, epochs)

~/.conda/envs/mrcnn/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(args, *kwargs)
85 warnings.warn('Update your ' + object_name + 86 ' call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87 return func(args, *kwargs)
88 wrapper._original_function = func
89 return wrapper

~/.conda/envs/mrcnn/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
2062 max_queue_size=max_queue_size,
2063 workers=workers,
-> 2064 use_multiprocessing=use_multiprocessing)
2065 else:
2066 # No need for try/except because

~/.conda/envs/mrcnn/lib/python3.6/site-packages/keras/engine/training.py in evaluate_generator(self, generator, steps, max_queue_size, workers, use_multiprocessing)
2154 use_multiprocessing=use_multiprocessing,
2155 wait_time=wait_time)
-> 2156 enqueuer.start(workers=workers, max_queue_size=max_queue_size)
2157 output_generator = enqueuer.get()
2158

~/.conda/envs/mrcnn/lib/python3.6/site-packages/keras/utils/data_utils.py in start(self, workers, max_queue_size)
594 thread = threading.Thread(target=data_generator_task)
595 self._threads.append(thread)
--> 596 thread.start()
597 except:
598 self.stop()

~/.conda/envs/mrcnn/lib/python3.6/multiprocessing/process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

~/.conda/envs/mrcnn/lib/python3.6/multiprocessing/context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~/.conda/envs/mrcnn/lib/python3.6/multiprocessing/context.py in _Popen(process_obj)
275 def _Popen(process_obj):
276 from .popen_fork import Popen
--> 277 return Popen(process_obj)
278
279 class SpawnProcess(process.BaseProcess):

~/.conda/envs/mrcnn/lib/python3.6/multiprocessing/popen_fork.py in __init__(self, process_obj)
17 util._flush_std_streams()
18 self.returncode = None
---> 19 self._launch(process_obj)
20
21 def duplicate_for_child(self, fd):

~/.conda/envs/mrcnn/lib/python3.6/multiprocessing/popen_fork.py in _launch(self, process_obj)
64 code = 1
65 parent_r, child_w = os.pipe()
---> 66 self.pid = os.fork()
67 if self.pid == 0:
68 try:

OSError: [Errno 12] Cannot allocate memory

````

And the packages are strictly follow the requirements(first time I tried the latest versions failed):
numpy
scipy
Pillow
cython
matplotlib
scikit-image
tensorflow>=1.3.0
keras>=2.0.8
opencv-python
h5py
imgaug

Please, anybody, tell me what is going wrong!
Thanks in advance!

Source

BearVic

All 13 comments

There is something wrong about the multi-processing. I fixed this error by setting the use_multiprocessing to False. (It is weird because I tried that before but only this time works.)

BearVic on 16 Jul 2018

Similar to #722

Try the steps in that issue and see if that helps. For me, the OOM errors were due to too large a queue_size. Disabling multiprocessing will undoubtedly help, but also try changing the number of workers and max_queue_size in model.py

jlognn on 17 Jul 2018

@JoeLogan1981 I have tried them all but none of them works. I guess it is something wrong with the multi-thread operation of python or keras.

BearVic on 17 Jul 2018

Are you sure your model is using the gpu?

artisvirat on 18 Jul 2018

@artisvirat I am sure.

BearVic on 18 Jul 2018

@BearVic I have the same problem and set use_multiprocessing to false. Now the code will not say "OSError: can't allocate memory", but it will stop at last step of one epoch. When I look the memory of compute by using "free -h", I find the "free memory" become smaller as running of the code.
How can I fix it to let code run more epochs?
ps: ubuntu16.04 keras 2.0.9

shangjianan2 on 24 Jul 2018

@shangjianan2 Have you solved the problem, I have met the same one. It worked well on ballon data but the problem shows up if I use my own data

Zwenjay on 28 Jul 2018

@Zwenjay
Sorry, I am not clear what your means. Here is my understand.
When you first run the code of ballon, the problem of "it will stop at last step of one epoch" appeared. Then you fix this problem and the code can run on ballon data. But on your own data, "it will stop at last step of one epoch" will appear again.Is my understand right?
Here is my question.
1 How do you fix this problem?
2 What cause this problem?
3 Do you try "train_shapes.ipynb"? Does it work right on your PC?
Thank you for your reply.

shangjianan2 on 28 Jul 2018

@shangjianan2
This problem never appeared when I train on balloon data based on ballon.py, but when I changed the data , this problem would appear, I did not change the code. I wonder if the image format matters.

That's what I mean.

Zwenjay on 28 Jul 2018

@Zwenjay did you solve the problem? "I am having sequence argument must have length equal to input rank" error even when I am using png images as nucleus sample do.

Traceback (most recent call last): File "C:\Users\ADMIN\Mask\Mask_RCNN-new\mrcnn\model.py", line 1717, in data_ge nerator use_mini_mask=config.USE_MINI_MASK) File "C:\Users\ADMIN\Mask\Mask_RCNN-new\mrcnn\model.py", line 1227, in load_im age_gt mask = utils.resize_mask(mask, scale, padding, crop) File "C:\Users\ADMIN\Mask\Mask_RCNN-new\mrcnn\utils.py", line 517, in resize_m ask mask = scipy.ndimage.zoom(mask, zoom=[scale, scale, 1], order=0) File "C:\ProgramData\Anaconda3\envs\MaskRCNN\lib\site-packages\scipy\ndimage\i nterpolation.py", line 606, in zoom zoom = _ni_support._normalize_sequence(zoom, input.ndim) File "C:\ProgramData\Anaconda3\envs\MaskRCNN\lib\site-packages\scipy\ndimage\_ ni_support.py", line 65, in _normalize_sequence raise RuntimeError(err) RuntimeError: sequence argument must have length equal to input rank

fastlater on 28 Aug 2018

我也碰到了这个问题，你们用的是多少版本的keras和ts？？和版本有问题吗？

zhuqiang00099 on 17 Jan 2019

😕4 👎4

@JoeLogan1981 I have tried them all but none of them works. I guess it is something wrong with the multi-thread operation of python or keras.
Have you solved this problem?I had the same problem running my own program on the server.The problem was that the program could run, but the amount of memory consumed during the run grew and eventually "OSError: can't allocate memory". I didn't know how to fix it.