I'm experiencing hard locks when trying to predict labels in parallel using joblib. I tried using multiprocessing directly instead of joblib and the same thing happens. The function that runs in parallel and that calls keras model (trained using tensorflow's backend) just gets locked, no prediction is made and the processed gets hung forever. This happens both on Mac and Linux.
The example in the gist I'm referencing below can't be run without the trained model, but it illustrates the kind of problem I'm talking about. Following the example should be enough to reproduce this issue.
https://gist.github.com/paulomalvar/4457018d4833dd9fd452f46788ef55a1
I tried retraining the models using Theano as the backend and this solved the issue for the code I shared above. However I'm using more models in my project. Retraining using Theano for those models didn't solve the issue.
I tried retraining with a lower dimensionality as output of the first neurons layer and weirdly enough that solved the issue. But this is not a solution, just a patch.
How is it possible that a model with more dimensions hangs my code when trying to parallelize it?
I did another test on a machine that has 256GB of RAM and the same issue happens. And the machine is only using around 1% of all the available memory so this is not a memory issue.
how to solved it thanks
Same problem here. I would like to load a model from json, load the weights in the parent process. Then run some predictions of this model in different child processes. I do not know if it is possible. I tried with multiprocessing module and had the same troubles.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Same issue here
same issue, and found some related info here:
https://stackoverflow.com/questions/42504669/keras-tensorflow-and-multiprocessing-in-python
@JordanPeltier Did you Find any solution for that?
I have a whole pipeline set, one of the steps is to predict something. I Load everything in the parent, and the throw a bunch of child processes, but at the predict step it gets hung.
I cant seem to find a solution for that.
Not possible. You need to load the model in the child processes (in the run method).
Switching to Theano as the backend solved the issue
Wont it then replicate the memory used?
I mean, my idea is load model once, parallelize the predict method. Having several child processes who just call the predict. Otherwise it makes it memory consuming, and that I already have :(
@paulomalvar Hi, have you solved your problem? I encountered the same problem. I loaded trained model in main process, and try to use model.predict in child processes, but it hangs forever...
solution from https://github.com/fchollet/keras/issues/6124
model = Model(inputs=[l_input], outputs=[out_actions, out_value])
model._make_predict_function() # have to initialize before threading
@yhcharles thanks, but it doesn't work. I don't know why the model.predict works when I used python threading, while it hangs when using multiprocessing.
I'm having the same problem. First, I tried passing the loaded model object into the child process. The call to model.predict() hangs. Then, I tried passing the model's path in and loading the model in the child before using it, but the call to keras.models.load_model() hangs in the child process too!
Has anyone gotten this to work?
I used process based parallel instead of thread based parallel solved this. I think the trained model can not be loaded by several threads at same time.
I am having this exact same issue when trying to load the model in child processes.
The hanging seems to occur when the weights are getting set on the model.
The weights themselves are stored in an array, which the child processes have access to.
Attempting to load the weights from file inside each of the child processes also causes the hang.
I have tried multiple different ways of resolving this but at the moment it seems there's no way for Keras and Tensorflow to run prediction in child processes - it simply doesn't seem to be written in a manner that supports this.
This may be a Keras problem but I suspect it could be due to Tensorflow requiring two threads to work:
https://github.com/tensorflow/tensorflow/issues/11066
If anyone makes any progress on this I'd be interested to hear it.
@kebwi load the model in the "run()" of child process, works, but very slow..
And remember to call K.clear_session() at the end of "run()", which will manually release resources.
This issue is still a thing in 2019. However, I circumvented the halting by loading the model in the subprocess and let the processes communicate via queues. Here is a minimal example using the webcam and VGG16 for feature extraction: https://stackoverflow.com/a/54881298/2084944
Load the trained model once, and then apply the model in multi-process to make predictions. I tried a couple of different ways, but no success. Anyone has good examples on it?
I am facing the same issue. I saved the model in a process and when trying to load the model using model = tf.keras.models.load_model('model.h5') in the child process, it hangs forever.
Any solution to this?
Haha... facing this issue in 2020. I guess the model predict works in multithread, so the lock needed. The child process clone the lock status, it may cause racing condition.
@y18zhou I think you're on the right track. Setting tf.config.threading.set_intra_op_parallelism_threads(1) seems to work.
It looks like loading a model into memory (i.e. via load_model) runs on multiple threads, and if you create a child process before this has finished, the child process will inherit the lock and hang. Setting mp.set_start_method('spawn') seems to make no difference.
Most helpful comment
Same problem here. I would like to load a model from json, load the weights in the parent process. Then run some predictions of this model in different child processes. I do not know if it is possible. I tried with multiprocessing module and had the same troubles.