Keras: Multiprocessing using fit_generator(pickle_safe=True) fails

Created on 18 Jan 2017  路  28Comments  路  Source: keras-team/keras

I'm trying to use fit_generator to seperate data loader from trainer.

model.fit_generator(data_gen(), samples_per_epoch=10000, nb_epoch=1, pickle_safe=True, verbose=0)

Excuting this code produces error like below:

```
Traceback (most recent call last):
File "main_generator.py", line 138, in
model.fit_generator(data_gen(), samples_per_epoch=10000, nb_epoch=1, pickle_safe=True, verbose=0)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\keras\models.py", line 935, in fit_generator
initial_epoch=initial_epoch)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\keras\engine\training.py", line 1470, in fit_generator
pickle_safe=pickle_safe)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\keras\engine\training.py", line 436, in generator_queue
thread.start()
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\context.py", line 313, in _Popen
return Popen(process_obj)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
reduction.dump(process_obj, to_child)
File "C:\dev\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\multiprocessing\reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'generator_queue..data_generator_task'
````

I also tried to test keras/tests/keras/test_multiprocessing.py, but it failed.

Here is output for test_multiprocessing.py:
test_multiprocessing.faillog.txt

Is it bug of Keras itself?
Any fixes available?

Most helpful comment

This is still an issue for python 3.5, windows 7.

All 28 comments

Same problem. Any update?

Is this the same as #5510? If so, I believe this is a windows error right now. What OS are you using?

@eliafrigieri are you using windows?

I'm using: windows 10, python 2.7.12, keras 1.0.8 with theano.

@eliafrigieri @DofuUZ I am starting to believe this is a windows issue. I get the same issue with tf and python 3.5. Good to know it manifests with python 2.7.

I can confirm the same issue. Windows 10, python 3.5, keras with theano.

Its been 6 years since I've mucked with python multiprocess and it was python 3.4. Im happy to contribute experience. Can anyone else offer assistance?

I've made my own solution, avoiding this issue. Basically rewriting the function that creates, populates and returns the queue and the stopping event (generator_queue into training.py)

@eliafrigieri Any chance you could post here? It would be much appreciated. I've been trying out a load of things over the past couple of days, to no avail...

It depends, what is your task?

I've got a large dataset and am trying to speed up training time on this task here: https://www.kaggle.com/c/data-science-bowl-2017. The windows issues with multiprocessing are proving quite painful.
Edit: if what you've got is sensitive then no worries, you don't need to post it. I was only asking on the off chance that it was something inconsequential.

I have large dataset too, I've added some fuctions that load "batch-size" images and put into the queue. For example if you have 1000 images, you can split into 10 groups of 100 images each and launch 10 process for parallel loading; then only one process get from queue and call "train-on-batch".
I'm not posting code, because it is too badly written and it is not defenitively version for my task (probaly I will change code every day in the next two week, only for parallel loading to increase the speed)

@eliafrigieri Very cool@ Are you able to share?

I was trying all sorts of options and I think I did something similar to you, @eliafrigieri . I was trying to create an external data_generator class that would use multiprocessing to populate a queue. The class was an iterator, so the queue would be accessed via __next__(). It's part of a larger code base, but I extracted an example here (attached
data_generator_sample_main.txt
data_generator_sample.txt

).

The problem I was having was that each multiprocessing pool imported keras, so I was getting all sorts of CNMEM warnings and everything looked like it was overflowing.

Does anyone have any insights on this?

I'm in the same situation. Every process I create import keras (I think because it's keras process creating the child), but once all the process are created and running the speed of loading increse a lot.

In case it helps, I was able to get the sample running without importing keras on child processes by including the import keras commands inside the get_model() function in data_generator_sample as this was the only place they were used. I'm not sure I'll be able to get it working like that in the full version of my project, but it may be an option for some people.

I thought the same solution, but it's not applicable in my solution so I don't tryed at all.
The question is: this issue appears only on windows? It's a bug of keras?

It appears to be a bug in windows but it could also be a poor assumption made in keras wrt multiprocessing that manifests on windows.

My understanding of it is that if using multiprocessing with Windows you can't reference local variables from the point at which you call the process. You need to pass all variables explicitly via the args input. I think it should be possible to adapt the keras code by defining multiprocessing version of data_generator_task() outside the scope of the generator and passing a generator / stop / queue etc into it. That way it could work as a standalone function and can be spawned across multiple processes on any platform. Maybe best for windows users to try this option.

@ciararogerson This sounds reasonable and it looks like it is consistent with the example in #5510. Would you agree?

Today I've tryed the "multiprocessing.py" test on a Mac, with the same configuration of my: python 2.7.13, keras 1.0.8 and works fine. So the problem is only for windows, now we have the evidence

@eliafrigieri Great work!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

This is still an issue for python 3.5, windows 7.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Will anybody fix it eventually?

Looks like there is a PR created and merged for this. So can this be closed now ?

I have the same issue in predict_generator, however fit_generator is working find with multiprocessing

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zygmuntz picture zygmuntz  路  3Comments

anjishnu picture anjishnu  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments

somewacko picture somewacko  路  3Comments

kylemcdonald picture kylemcdonald  路  3Comments