Keras: Multiprocessing: Failed to get device properties

Created on 10 May 2018  路  15Comments  路  Source: keras-team/keras

I would like to use a keras model in a multiprocessing setup.
The model is used in a generator, which produces data to train another model.

As long as I don't use multiprocessing, everything works fine.
But with multiprocessing, I get the following error:

E tensorflow/core/grappler/clusters/utils.cc:81] Failed to get device properties, error code: 3

I searched how to use Keras in a multithreaded context and found this:
https://github.com/keras-team/keras/issues/5640

Apparently, I need to call _make_predict_function and get the tensorflow graph.
I added this to my code:

before any training:

q_approximator = create_model()
q_approximator_fixed = create_model()

q_approximator._make_predict_function()
q_approximator_fixed._make_predict_function()

# only this one will be trained
q_approximator.compile(RMSprop(LEARNING_RATE, rho=RHO, epsilon=EPSILON), loss=huber_loss)

graph = tf.get_default_graph()
#graph = K.get_session().graph         # this way also doesn't work

inside the generator:
```
with graph.as_default():
q_values = q_approximator_fixed.predict([state.reshape(1, *INPUT_SHAPE),
np.ones((1, NUM_ACTIONS))])

and finally, the training setup:

q_approximator.fit_generator(interaction_generator(q_approximator_fixed,
replay_memory,
exploration,
interaction_counter,
interaction_lock),
epochs=10, steps_per_epoch=BATCH_SIZE * 1000,
use_multiprocessing=True,
workers=1)
```

With just 1 worker and no multiprocessing it works fine. Multiple workers and no multiprocessing, also fine.
But a single worker and multiprocessing makes the program crash with the above error message.

How can I use a keras model in a multiprocessing context ?

All 15 comments

Since the error message complains about no device information, I thought it was probably a problem of initializing tensorflow correctly in the other processes.

So I decided to try something else first:
Instead of sharing the model, which apparently doesn't provide necessary information, I would recreate the model in every process.

Now I provide the weights for the model as parameter to the generator and have this code within the generator:

q_approximator_fixed = create_model()
q_approximator_fixed.set_weights(weights)

It hangs on loading the weights.

Ok, so I googled tensorflow variable initialization in a multiprocess setup. This issue: https://github.com/tensorflow/tensorflow/issues/5448 seems to have exactly my problem.
They create a new session in each subprocess, and provide the existing graph to this session.
I added this code inside my generator

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    config.log_device_placement = True

    sess = tf.Session(config=config, graph= graph)
    K.set_session(sess)

    q_approximator_fixed = create_model()
    q_approximator_fixed.set_weights(weights)

Now it logs how the variables are allocated:

I tensorflow/core/common_runtime/placer.cc:886] input_1: (Placeholder)/job:localhost/replica:0/task:0/device:CPU:0

But they get allocated on CPU, which is not what I want, and the code still hangs during / after allocation.

The issue also says that moving imports to the subprocess fixed the problem. So I added the following at the top of my generator. But that doesn't do anything:

    import keras
    import tensorflow as tf

Here is a minimal working example to reproduce the error.

Please note: I have not applied any of the proposed solutions here.
They don't work for me, and I didn't want to clutter the code with them.

import numpy as np

from keras.layers import Input, Dense
from keras.models import Model
from keras.optimizers import Adam

def create_model():
    input_layer = Input((10,))
    dense = Dense(10)(input_layer)

    return Model(inputs=input_layer, outputs=dense)

model_outside = create_model()
model_outside.compile(Adam(1e-3), "mse")

def subprocess_routine(weights):
    model_inside = create_model()
    model_inside.set_weights(weights)

    while True:
        batch = np.random.rand(10, 10)
        prediction = model_inside.predict(batch)

        yield batch, prediction

weights = model_outside.get_weights()

model_outside.fit_generator(subprocess_routine(weights),
                            epochs=10,
                            steps_per_epoch=100,
                            use_multiprocessing=True,
                            workers=1)

I have the exact same problem.

I got same problem when using multiprocessing create Keras model at Linux.
But not happenl at Windows 10.

I got this problem' too. Ubuntu16.04+cuda9.0+tensorflow_gpu1.8+keras2.2.0

Same here on Ubuntu16.04 nvidia-docker container tensorflow-gpu1.8 and keras2.2.0

I am having the same issue. Initially I was fixing a problem about resetting states from a Sequence object I use for fit_generator where tensorflow would complain about tensors being in different graphs. Now when I have multiprocessing=True, and do a

with session.as_default():
  with graph.as_default():
    model.reset()

in my Sequence object's __getitem__ function (which I have to, because I use stateful LSTMs), I get the mentioned error:

2018-08-01 23:52:33.053069: E tensorflow/core/grappler/clusters/utils.cc:82] Failed to get device properties, error code: 3

If I don't use multiprocessing, the training proceeds as usual. Not being able to use multiprocessing is horrible for my pipeline, so multiprocessing=False is not really an option. I am thus a bit stuck.Any ideas?

Linux Mint 18.2 (xenial-based), Cuda 9.0.176, keras 2.2.2, tensorflow-gpu 1.9.0

I fixed this problem by:
use train_on_batch instead of fit

Closing as this is resolved

@lhk Could you please post your complete solution? Thank you

I fixed up this error by reinstalling my GPU driver

I have the same issue. Tried using trian_on_batch instead of fit and it still did not work. I'm not why the issue was closed.

Try reinstalling your graphics driver

Closing as this is resolved

It's not resolved, he just found a strange work-around

Was this page helpful?
0 / 5 - 0 ratings