After loading a Keras model, you might expect to be able to pass this model around to multiple threads to do inference. When trying this with the Python flask web server I ran into trouble.
If I load the model on each thread all runs smoothly. Except loading the model takes 1sec or 2/3rds of my runtime. I'd like to move the model loading out of the hot-path into the startup code then share this model among threads. I've attached a gist (a little incomplete) which illuminates the problem.
https://gist.github.com/sshack/f086aa4bd6932346895e280b8060ea6a
Below is an example of the output I get. It seems Keras is calling the tensor flow backend to create a session in a non-threadsafe way?
Using TensorFlow backend.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Starting emotion detection service.
Hi sshack, I have the same exact problem to a tee with one caveat: I'm using two models. Have you made any progress on a solution?
This workaround might help you #5896
Hi Jeff,
Eyesonlyhack beat me to the push. That’s exactly the approach I took to eventually solving my problem.
Load the model, save a copy of the graph, then use that saved graph context for any model inference within a thread.
Cheers
Steven
On 2017Apr 4,, at 9:30 AM, Jeff Faath notifications@github.com wrote:
Hi sshack, I have the same exact problem to a tee with one caveat: I'm using two models. Have you made any progress on a solution?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
Excellent, I will try that approach. Thanks for the replies gents.
@sshack @footh any chance you have any working code you could add here? The workaround mentioned above by @eyesonlyhack doesn't seem to match the description above of "load the model, save a copy of the graph"; it just adds a call to graph.as_default()
inside the model creation function.
I posted a StackOverflow question a while back about multi-threaded training but I'm curious about inference as well.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
This issue seems to be a major problem with flask
apps as well. If the flask
app loads a keras
model, then you cannot use the --debug
option of flask
to absorb code changes interactively. That debug process will fork the existing process and re-execute initialization steps with the updated code, and at some part of this fork operation, it causes the loaded keras
model to break.
I am using flask and met the same problem.
The solution in https://github.com/fchollet/keras/issues/2397#issuecomment-306687500 works for me.
Details:
global thread:
self.model = load_model(model_path)
self.model._make_predict_function()
self.graph = tf.get_default_graph()
another thread:
with self.graph.as_default():
labels = self.model.predict(data)
How is getting the graph
this way different than tf.get_default_graph()
?
from keras import backend
#...
self.graph = backend.get_session().graph
In my case I did it a bit differently, in case it helps anyone:
# on thread 1
session = tf.Session(graph=tf.Graph())
with session.graph.as_default():
k.backend.set_session(session)
model = k.models.load_model(filepath)
# on thread 2
with session.graph.as_default():
k.backend.set_session(session)
model.predict(x, **kwargs)
The novelty here is allowing for multiple models to be loaded (once) and used in multiple threads.
By default, the "default" Session
and the "default" Graph
are used while loading a model.
But here you create new ones.
Also note the Graph
is stored in the Session
object, which is a bit more convenient.
Most helpful comment
I am using flask and met the same problem.
The solution in https://github.com/fchollet/keras/issues/2397#issuecomment-306687500 works for me.
Details:
global thread:
another thread: