UserWarning: Using a generator with use_multiprocessing=True and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
Hi, I got this error when I trained Mask RCNN on the cluster.
Traceback (most recent call last):
File "carla.py", line 622, in
train(model)
File "carla.py", line 529, in train
layers='heads', carla_rate= 0.5)
File "/cluster/home/guzhou/Mask_RCNN5/mrcnn/model.py", line 2444, in train
use_multiprocessing=True,
File "/cluster/apps/python/3.6.4/lib64/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(args, *kwargs)
File "/cluster/apps/python/3.6.4/lib64/python3.6/site-packages/keras/engine/training.py", line 2192, in fit_generator
generator_output = next(output_generator)
File "/cluster/apps/python/3.6.4/lib64/python3.6/site-packages/keras/utils/data_utils.py", line 774, in get
if not self.queue.empty():
File "
File "/cluster/apps/python/3.6.4/lib64/python3.6/multiprocessing/managers.py", line 757, in _callmethod
kind, result = conn.recv()
File "/cluster/apps/python/3.6.4/lib64/python3.6/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/cluster/apps/python/3.6.4/lib64/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/cluster/apps/python/3.6.4/lib64/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Do this error have anything to do with the issue?
Thanks!
@zgxsin Have you solved this problem? I meet the same one, can you help me? Thank you so much!
Here is the log:
/home/*****/miniconda3/lib/python3.6/site-packages/keras/engine/training.py:2023: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
UserWarning('Using a generator with `use_multiprocessing=True`'
Epoch 1/1
99/100 [============================>.] - ETA: 0s - loss: 1.3819 - rpn_class_loss: 0.0063 - rpn_bbox_loss: 0.3933 - mrcnn_class_loss: 0.0268 - mrcnn_bbox_loss: 0.4655 - mrcnn_mask_loss: 0.4899/home/zhangwenjie/miniconda3/lib/python3.6/site-packages/keras/engine/training.py:2251: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
UserWarning('Using a generator with `use_multiprocessing=True`'
Got the same problem !
I encountered the same issue as @Zwenjay and @alessandropadrinofficial today. Will try restarting with less GPUs and report of this happens again (was running of 4 when I got the error). Nobody has an idea why this is happening right?
EDIT: As expected this is not an issue if I train in single GPU. I have no idea how to fix this, or at least how to catch the exception and restart the training automatically, any suggestions? I get this problem so often at this point makes no sense to use multiple GPUs
@Zwenjay how mysterious is this country you come from? Are we talking Srilanka-level mysterious ? Or...
I encountered this issue as well. The duplicated data created by multiple generators downgrades the performance of the model. I either have to set use_multiprocessing to false (or use less workers than my CPU core) and tolerate the slow progress, or I have to increase the steps_per_epoch so duplicates are less likely to occur.
Is there a way to rewrite the data_generator function as a class that implements keras.utils.Sequence to resolve the duplicated data issue? Some relevant threads on this
https://github.com/keras-team/keras/wiki/Understanding-parallelism-in-Keras
Hi everybody. If you're reading this you probably have almost everything ready. The solution is to:
1) Make sure your generator class inherits from keras.utils.Sequence
2) Implement a __len__(self) and __getitem__(self, idx) methods in your class definition. __len__ will calculate the nuber of steps and getitem will use idx in range(__len__()) to feed the batch generation. (Just make __getitem__ call your batch generator method you already have)
3) When calling fit_generator pass the object, not the object.generator()
If you're confused read this blog entry that detailed the process for me:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
WARNING!. parameter workers seems to refer to number of CPUs, not number of threads. If your fit_generator hangs, this might be it.
There's a PR https://github.com/matterport/Mask_RCNN/pull/740 that almost solves this issue but it seems old.
@luxedo I tried a new one in #1611
@world4jason I had the same problem. I was using a server for computation and after one epoch, the server was giving me an error. I tried your solution #1611 and it worked like a charm!!!!!!
Thank you very much for your help.
Hi guys,
I have the same error.
My question is what are the effects from an error like that on my model training .
Does it just slow down the process or it has impact on the efficiency of the training?
Should I try and solve it or I can train having this issue without impact on the efficiency?
Hi everybody. If you're reading this you probably have almost everything ready. The solution is to:
- Make sure your generator class inherits from keras.utils.Sequence
- Implement a len(self) and getitem(self, idx) methods in your class definition. len will calculate the nuber of steps and getitem will use idx in range(len()) to feed the batch generation. (Just make getitem call your batch generator method you already have)
- When calling fit_generator pass the object, not the object.generator()
If you're confused read this blog entry that detailed the process for me:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-flyWARNING!. parameter workers seems to refer to number of CPUs, not number of threads. If your fit_generator hangs, this might be it.
Hi wilmerhenao,
In spite of following the above 3 points , I am still getting the same warning. Please let me know your suggestions of how to avoid duplication of data
for now, temporary solution is increase STEPS_PER_EPOCH to higher value, for example, 6000
Most helpful comment
Hi everybody. If you're reading this you probably have almost everything ready. The solution is to:
1) Make sure your generator class inherits from keras.utils.Sequence
2) Implement a __len__(self) and __getitem__(self, idx) methods in your class definition. __len__ will calculate the nuber of steps and getitem will use idx in range(__len__()) to feed the batch generation. (Just make __getitem__ call your batch generator method you already have)
3) When calling fit_generator pass the object, not the object.generator()
If you're confused read this blog entry that detailed the process for me:
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
WARNING!. parameter workers seems to refer to number of CPUs, not number of threads. If your fit_generator hangs, this might be it.