Keras: clear_session() with tensorflow 1.8 backend results in a segfault

Created on 11 Jun 2018  Â·  11Comments  Â·  Source: keras-team/keras

Hi,

the following code results in a segfault after the model was fitted a second time:

import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.backend import clear_session

def test():
    data = np.random.random((1000, 100))
    labels = np.random.randint(2, size=(1000, 1))
    model = Sequential()
    model.add(Dense(32, activation='relu', input_dim=100))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
    model.fit(data, labels, epochs=3, batch_size=32)

test()
clear_session()
test()

The output:

Using TensorFlow backend.
Epoch 1/3
1000/1000 [==============================] - 3s 3ms/step - loss: 0.7174 - acc: 0.4910
Epoch 2/3
1000/1000 [==============================] - 0s 72us/step - loss: 0.7056 - acc: 0.5050
Epoch 3/3
1000/1000 [==============================] - 0s 70us/step - loss: 0.6989 - acc: 0.5150
Epoch 1/3
1000/1000 [==============================] - 0s 187us/step - loss: 0.6936 - acc: 0.5320
Epoch 2/3
1000/1000 [==============================] - 0s 67us/step - loss: 0.6854 - acc: 0.5600
Epoch 3/3
1000/1000 [==============================] - 0s 70us/step - loss: 0.6802 - acc: 0.5720
Segmentation fault (core dumped)

Not calling clear_session() prevents the segfault. This only happens with keras 2.2 and tensorflow 1.8 (both CPU and GPU version). All other combinations of keras (2.1.x) and tensorflow (<1.8 and also 1.9.0rc0) don't result in a segfault.

Is this a keras or tensorflow issue?

Most helpful comment

For what it's worth, I have the same problem with tensorflow 1.8.0 and keras 2.2.0. I had placed clear_session at the end of each fold in a cross-validation loop and was getting Segmentation fault (core dumped). Removing clear_session resolved this issue.

All 11 comments

I had the same issue as @budach . I had to downgrade to tensorflow-gpu 1.8 with cuda 9.0 and cuDNN 7, in order to avoid this error with Keras 2.2.0.

tl;dr:

  • No issues when using keras 2.1.6
  • No issues when using tensorflow-cuda
  • No issues when using tensorflow and python2.7
  • Segfault appears after https://github.com/keras-team/keras/pull/10087
  • The only combination that gives segfaults for me is python3.6, tensorflow, keras 2.2.0

My unit tests on Travis were failing because of this .. I'll explain my situation, perhaps it adds some information. My python2.7 build was finishing successfully, python3.6 was failing with a segfault. Locally, using the GPU version of tensorflow, everything passes on python3.6 (regardless of actually using a GPU or not). The code I was using is similar to that of the original author of this issue:

import keras
import numpy as np
import tensorflow as tf

def get_session():
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    return tf.Session(config=config)

def do_train():
    data = np.random.random((1000, 100))
    labels = np.random.randint(2, size=(1000, 1))
    model = keras.models.Sequential()
    model.add(keras.layers.Dense(32, activation='relu', input_dim=100))
    model.add(keras.layers.Dense(1, activation='sigmoid'))
    model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
    model.fit(data, labels, epochs=1, batch_size=1)

if __name__=='__main__':
    keras.backend.tensorflow_backend.set_session(get_session())
    do_train()

    keras.backend.tensorflow_backend.set_session(get_session())
    do_train()

But I think the underlying issue is the same. I've also run it with gdb, which gives me the following:

#0  0x00007fffd6ba3398 in tensorflow::TF_SessionReleaseCallable(TF_Session*, long, TF_Status*) () from /home/hgaiser/dev/keras-segfault/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#1  0x00007fffd6b525e2 in _wrap_TF_SessionReleaseCallable () from /home/hgaiser/dev/keras-segfault/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#2  0x00007ffff7420343 in _PyCFunction_FastCallDict () from /usr/lib/libpython3.6m.so.1.0
#3  0x00007ffff73e984e in ?? () from /usr/lib/libpython3.6m.so.1.0
#4  0x00007ffff73ad0fa in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.6m.so.1.0
#5  0x00007ffff73e8b2b in _PyFunction_FastCallDict () from /usr/lib/libpython3.6m.so.1.0
#6  0x00007ffff741b28f in _PyObject_FastCallDict () from /usr/lib/libpython3.6m.so.1.0
#7  0x00007ffff741bc12 in _PyObject_Call_Prepend () from /usr/lib/libpython3.6m.so.1.0
#8  0x00007ffff741afec in _PyObject_FastCallDict () from /usr/lib/libpython3.6m.so.1.0
#9  0x00007ffff73ef136 in ?? () from /usr/lib/libpython3.6m.so.1.0
#10 0x00007ffff7381b61 in ?? () from /usr/lib/libpython3.6m.so.1.0
#11 0x00007ffff7480b9d in ?? () from /usr/lib/libpython3.6m.so.1.0
#12 0x00007ffff7480c02 in PyGC_Collect () from /usr/lib/libpython3.6m.so.1.0
#13 0x00007ffff748496f in Py_FinalizeEx () from /usr/lib/libpython3.6m.so.1.0
#14 0x00007ffff7481313 in Py_Main () from /usr/lib/libpython3.6m.so.1.0
#15 0x0000555555554b5c in main ()

I've binary searched which commit was to blame and found https://github.com/keras-team/keras/pull/10087 to be the culprit.

@fchollet @TimZaman @ahundt can you shed some light on this issue?

Hey Hans. Any reason you set the session multiple times with your custom function? Could you tru calling the backend's clear_session() in between setting the sessions?

Hey Tim :)

Any reason you set the session multiple times with your custom function?

I don't particularly want to do this, but we are running unit tests on our training script. There are some parameters that we want to test (different models), so the training script is tested multiple times. Every time the training script is executed it sets a tf session. I could write it such that it only does that if you execute it directly (by moving it to if __name__=='__main__':), but I suppose there shouldn't be anything wrong with setting a session multiple times.

Could you tru calling the backend's clear_session() in between setting the sessions?

I could, but this will still segfault. As @budach pointed out, the issue also occurs when doing the sequence "training -> clear_session -> training". I noticed the same behaviour if I replace set_session with clear_session in my examples.

Ok. Did you try with the unit test decorator for keras?

On Wed, Jun 27, 2018, 09:47 Hans Gaiser notifications@github.com wrote:

Hey Tim :)

Any reason you set the session multiple times with your custom function?

I don't particularly want to do this, but we are running unit tests on our
training script. There are some parameters that we want to test (different
models), so the training script is tested multiple times. Every time the
training script is executed it sets a tf session. I could write it such
that it only does that if you execute it directly (by moving it to if
__name__=='__main__':), but I suppose there shouldn't be anything wrong
with setting a session multiple times.

Could you tru calling the backend's clear_session() in between setting the
sessions?

I could, but this will still segfault. As @budach
https://github.com/budach pointed out, the issue also occurs when doing
the sequence "training -> clear_session -> training". I noticed the same
behaviour if I replace set_session with clear_session in my examples.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/10399#issuecomment-400749880,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHXSRJRllQn7Mdm4MEJesLytJpNAmxlSks5uA7cpgaJpZM4Uibsw
.

Ok. Did you try with the unit test decorator for keras?

I didn't, but the decorator only adds a clear_session after running the test, so I don't suspect it changes anything.

I added it now but the result is the same. I had tried a similar thing before, but using a pytest fixture to call clear_session instead of a decorator. This had the same effect as well.

I have the same issue with tensorflow 1.8.0 and keras 2.2.0

For what it's worth, I have the same problem with tensorflow 1.8.0 and keras 2.2.0. I had placed clear_session at the end of each fold in a cross-validation loop and was getting Segmentation fault (core dumped). Removing clear_session resolved this issue.

upgrading tensorflow-gpu with pip resolved the problem for me.

thanks @hgaiser , I had Keras-2.2.2 installed and downgrading to 2.1.6 solve the problem of Setmentation fault when clear_session

Closing as this is resolved

Was this page helpful?
0 / 5 - 0 ratings