Keras: Multiprocessing using fit_generator(pickle_safe=True) fails

Created on 18 Jan 2017 · 28Comments · Source: keras-team/keras

I'm trying to use fit_generator to seperate data loader from trainer.

model.fit_generator(data_gen(), samples_per_epoch=10000, nb_epoch=1, pickle_safe=True, verbose=0)

Excuting this code produces error like below:

```
Traceback (most recent call last):
File "main_generator.py", line 138, in
model.fit_generator(data_gen(), samples_per_epoch=10000, nb_epoch=1, pickle_safe=True, verbose=0)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\keras\models.py", line 935, in fit_generator
initial_epoch=initial_epoch)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\keras\engine\training.py", line 1470, in fit_generator
pickle_safe=pickle_safe)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\keras\engine\training.py", line 436, in generator_queue
thread.start()
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
reduction.dump(process_obj, to_child)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'generator_queue..data_generator_task'
````

I also tried to test keras/tests/keras/test_multiprocessing.py, but it failed.

Here is output for test_multiprocessing.py:
test_multiprocessing.faillog.txt

Is it bug of Keras itself?
Any fixes available?

Source

dofuuz

Most helpful comment

This is still an issue for python 3.5, windows 7.

isaacgerg on 30 May 2017

👍8 😕1

All 28 comments

Same problem. Any update?

eliafrigieri on 23 Feb 2017

Is this the same as #5510? If so, I believe this is a windows error right now. What OS are you using?

isaacgerg on 24 Feb 2017

@eliafrigieri are you using windows?

isaacgerg on 24 Feb 2017

I'm using: windows 10, python 2.7.12, keras 1.0.8 with theano.

eliafrigieri on 24 Feb 2017

@eliafrigieri @DofuUZ I am starting to believe this is a windows issue. I get the same issue with tf and python 3.5. Good to know it manifests with python 2.7.

isaacgerg on 25 Feb 2017

I can confirm the same issue. Windows 10, python 3.5, keras with theano.

ciararogerson on 25 Feb 2017

Its been 6 years since I've mucked with python multiprocess and it was python 3.4. Im happy to contribute experience. Can anyone else offer assistance?

isaacgerg on 25 Feb 2017

I've made my own solution, avoiding this issue. Basically rewriting the function that creates, populates and returns the queue and the stopping event (generator_queue into training.py)

eliafrigieri on 25 Feb 2017

@eliafrigieri Any chance you could post here? It would be much appreciated. I've been trying out a load of things over the past couple of days, to no avail...

ciararogerson on 25 Feb 2017

It depends, what is your task?

eliafrigieri on 25 Feb 2017

I've got a large dataset and am trying to speed up training time on this task here: https://www.kaggle.com/c/data-science-bowl-2017. The windows issues with multiprocessing are proving quite painful.
Edit: if what you've got is sensitive then no worries, you don't need to post it. I was only asking on the off chance that it was something inconsequential.

ciararogerson on 25 Feb 2017

I have large dataset too, I've added some fuctions that load "batch-size" images and put into the queue. For example if you have 1000 images, you can split into 10 groups of 100 images each and launch 10 process for parallel loading; then only one process get from queue and call "train-on-batch".
I'm not posting code, because it is too badly written and it is not defenitively version for my task (probaly I will change code every day in the next two week, only for parallel loading to increase the speed)

eliafrigieri on 25 Feb 2017

@eliafrigieri Very cool@ Are you able to share?

isaacgerg on 25 Feb 2017

I was trying all sorts of options and I think I did something similar to you, @eliafrigieri . I was trying to create an external data_generator class that would use multiprocessing to populate a queue. The class was an iterator, so the queue would be accessed via __next__(). It's part of a larger code base, but I extracted an example here (attached
data_generator_sample_main.txt
data_generator_sample.txt

The problem I was having was that each multiprocessing pool imported keras, so I was getting all sorts of CNMEM warnings and everything looked like it was overflowing.

Does anyone have any insights on this?

ciararogerson on 26 Feb 2017

I'm in the same situation. Every process I create import keras (I think because it's keras process creating the child), but once all the process are created and running the speed of loading increse a lot.

eliafrigieri on 26 Feb 2017

In case it helps, I was able to get the sample running without importing keras on child processes by including the import keras commands inside the get_model() function in data_generator_sample as this was the only place they were used. I'm not sure I'll be able to get it working like that in the full version of my project, but it may be an option for some people.

ciararogerson on 26 Feb 2017

I thought the same solution, but it's not applicable in my solution so I don't tryed at all.
The question is: this issue appears only on windows? It's a bug of keras?

eliafrigieri on 26 Feb 2017

It appears to be a bug in windows but it could also be a poor assumption made in keras wrt multiprocessing that manifests on windows.

isaacgerg on 27 Feb 2017

My understanding of it is that if using multiprocessing with Windows you can't reference local variables from the point at which you call the process. You need to pass all variables explicitly via the args input. I think it should be possible to adapt the keras code by defining multiprocessing version of data_generator_task() outside the scope of the generator and passing a generator / stop / queue etc into it. That way it could work as a standalone function and can be spawned across multiple processes on any platform. Maybe best for windows users to try this option.

ciararogerson on 27 Feb 2017

@ciararogerson This sounds reasonable and it looks like it is consistent with the example in #5510. Would you agree?

isaacgerg on 27 Feb 2017

Today I've tryed the "multiprocessing.py" test on a Mac, with the same configuration of my: python 2.7.13, keras 1.0.8 and works fine. So the problem is only for windows, now we have the evidence

eliafrigieri on 27 Feb 2017

👍2

@eliafrigieri Great work!

isaacgerg on 27 Feb 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.