Tensorflow: Session got stuck after fork

Created on 21 May 2016 · 3Comments · Source: tensorflow/tensorflow

Version: nightly prebuilt for Python2 w/ GPU (just now)

I'm expecting the following code to print "10.0" 3 times, but session.run got stuck in all forked processes.

import tensorflow as tf
import multiprocessing as mp
import os

class Worker(mp.Process):
    def __init__(self, gid):
        self.gid = gid
        super(Worker, self).__init__()

    def run(self):
        G = tf.Graph()
        with G.as_default():
            x = tf.placeholder(tf.float32, shape=[])
            y = x * 2
            sess = tf.Session()
            print sess.run(y, feed_dict={x: 5})

G = tf.Graph()
with G.as_default():
    sess = tf.Session()
    with sess.as_default():
        x = tf.placeholder(tf.float32, shape=[])
        y = x * 2
        print sess.run(y, feed_dict={x: 5})

procs = [Worker(k) for k in range(2)]
for p in procs: p.start()
for p in procs: p.join()

Removing the graph/session in master process will solve the problem. So it seems like once there is a session, we cannot use fork?
The problem exists with and without GPU.

NOTE: this code doesn't terminate normally. You'll probably need to manually kill the forked processes after the master exited.

Source

ppwwyyxx

👀1

Most helpful comment

@mrry Does it mean there is a way to create fork safe tf.Session with tf.Session(args)?

mavenlin on 20 Oct 2016

👍3

All 3 comments

The in-process session (i.e. tf.Session() with no arguments) is not designed to be fork()-safe. If you want to share a set of devices between multiple processes, create a tf.train.Server in one process, and create sessions that connect to that server (with tf.Session("grpc://...")) in the other processes.

mrry on 21 May 2016

👍2 ❤1

@mrry Does it mean there is a way to create fork safe tf.Session with tf.Session(args)?

mavenlin on 20 Oct 2016

👍3

@mavenlin
The prototype of tf.Session is

tf.Session.__init__(target='', graph=None, config=None)

Here target refers to the execution engine to connect to. That is, one has to run the working session in another process, in distributed mode, and tf.Session with arguments is still not fork()-safe.

Sad news.