BTW: it would be great if keras could expose an unified API for reproducible training

Right. I will look into it. Or does anybody else want to take a look at it?

fchollet on 12 Apr 2016

👍21

Is there any update on this?

I could train reproducible models on theano (setting the seed before the keras import #439), but not when using the tensorflow backend.

related: #850

bplank on 30 Jul 2016

👍5

I run into this problem in edward, here is fix we went with after rather long discussion: https://github.com/blei-lab/edward/pull/184. Long story short - it is pretty hard to seed tensorflow if you have single shared session. Would be very interested to hear if there is a better solution :)

kudkudak on 23 Aug 2016

👍4

Any update on this?

fish128 on 17 Oct 2016

Correct me if I'm wrong, but looks like this issue is still open and there is no way currently in Keras with a TensorFlow backend to get reproducible results. Any update? Workaround?

bluelight773 on 5 Nov 2016

Well, there is this hack https://github.com/blei-lab/edward/pull/184, I can propose PR with that to Keras if that makes sense, @fchollet ?

The solution is to simply add set_seed() function, but raise an error if someone calls it after a TF variable is created. You cannot reseed after some Variable was created, as the previous seed was used to create initializers for it.

kudkudak on 5 Nov 2016

Any news on that issue? @bluelight773 I think when running it on the CPU it's reproducible - but that is not really an option most of the time

pibkac on 7 Dec 2016

@fchollet @zhuoqiang Could you confirm this?

Maybe there is a workaround by programming use both Keras and Tensorflow following this post:
https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html

Use Keras pre-defined model to speed up building your model. But use Tensorflow for input, output and optimization.
Take a look at this code, it seems could reproduce the result.
I use CentOS 7 server, with Tesla K40. It always shows 0.6268 for the result.

>>> keras.__version__
'1.1.1'
>>> tf.__version__
'0.12.0-rc1'

You should seed it by

import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)

"""
Different behaviors during training and testing

Some Keras layers (e.g. Dropout, BatchNormalization) behave differently at training time and testing time.
You can tell whether a layer uses the "learning phase" (train/test) by printing layer.uses_learning_phase,
a boolean: True if the layer has a different behavior in training mode and test mode, False otherwise.

If your model includes such layers, then you need to specify the value of the learning phase as part of feed_dict,
so that your model knows whether to apply dropout/etc or not.

To make use of the learning phase, simply pass the value "1" (training mode) or "0" (test mode) to feed_dict:
"""
import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)
sess = tf.Session()
from keras.layers import Dropout, Dense, LSTM
from keras import backend as K
K.set_session(sess)
from keras.objectives import categorical_crossentropy
from keras.metrics import categorical_accuracy as accuracy

# load data
from tensorflow.examples.tutorials.mnist import input_data
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)

img = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))

x = Dense(128, activation='relu')(img)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
preds = Dense(10, activation='softmax')(x)

loss = tf.reduce_mean(categorical_crossentropy(labels, preds))

# train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
train_step = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(loss)
# train_step = tf.train.AdagradOptimizer(learning_rate=0.001).minimize(loss)
# train_step = tf.train.AdadeltaOptimizer(learning_rate=0.001).minimize(loss)

with sess.as_default():
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        batch = mnist_data.train.next_batch(50)
        train_step.run(feed_dict={img: batch[0],
                                  labels: batch[1],
                                  K.learning_phase(): 1})

acc_value = accuracy(labels, preds)
with sess.as_default():
    print acc_value.eval(feed_dict={img: mnist_data.test.images,
                                    labels: mnist_data.test.labels,
                                    K.learning_phase(): 0})

jacobma-create on 20 Dec 2016

👍4

I heard that Keras is going to be merged in TensorFlow. Can I expect that the problem of reproducibility is solved at the same time? If YES, it will be great improvement for Kaggle usage!

nejumi on 25 Jan 2017

👍2

@nejumi, ditto. This lack of support makes it really hard to run experiments with Keras & TF. I appreciate the convos and solutions here but really hoping this gets fixed soon.

brannondorsey on 28 Jan 2017

👍4

In principle, this should do it:

import numpy as np
np.random.seed(...)
import tensorflow as tf
tf.set_random_seed(...)

However, there is still non-determinism in cuDNN.

With theano it is possible to ensure reproducibility of cuDNN by setting dnn.conv flags: https://github.com/fchollet/keras/issues/2479#issuecomment-213987747

With tensorflow, how do we set those flags?

diogoff on 7 Mar 2017

👍3

For some time, I had at least reproducible results when running the training on the CPU. However even that seems not to work any more. Anyone experienced the same?

pibkac on 10 Mar 2017

👍1

I'm looking for a way of reproducing keras code, but I'm supposing that it's not possible. Am I right?

iaguas on 30 Mar 2017

Thanks @diogoff but my problem is that I have tensorflow as backend and also I utilize cuDNN. It's the case that you are looking too for a solution.

iaguas on 30 Mar 2017

I gave up on reproducibility because I found that when forcing deterministic behavior in cuDNN, training would be much slower (e.g. from 15 secs/epoch to 30 secs/epoch).

diogoff on 30 Mar 2017

👍1

IMO this is a critical issue that merits a high priority. Running a complex model for several minutes is meaningless unless results can be reproduced.

Running the same cell multiple times has given results that differ by several orders of magnitude. I can confirm the latest suggestion does not work for Keras 2.0.2/ TensorFlow 1.0 backend/Anaconda 4.2/Windows 7

import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)

pylang on 1 Apr 2017

👍12

@pylang are you using cuDNN?

diogoff on 1 Apr 2017

@diogoff I have not taken extra steps to install cuDNN. My assumption is no, though I am unaware how verify this absolutely.

pylang on 1 Apr 2017

Try:
$ ls -las /usr/local/cuda/include/*dnn*
and
$ ls -las /usr/local/cuda/lib64/*dnn*

If you see libcudnn.so installed, you have it and probably tensorflow is using it.

If I remember, tensorflow will print some warning/info messages on startup, saying which libraries it has loaded. On my system, libcudnn.so was one of them.

diogoff on 1 Apr 2017

I searched all files on my Windows machine and found none by that name, nor any system files with "cudnn" (only folders included in Anaconda's TensorFlow site package). I also don't see any warnings aside from the "TensorFlow backend" warning upon import. Seeing that

I have not directly installed the driver, find no library files under this name, and see no unusual warnings at import, I conclude I do not have cudnn installed.

pylang on 1 Apr 2017

On another note ... I perceived the main issue with non-reproducible results in keras may be related to how the weights are randomized for each call.

I did discover (late last night), that the kernel_initializer has a number of options for setting up a distribution from which (I assume) the weights are drawn. I have not run substantial tests to make a conclusion nor investigated these options further yet, but my initial tests seem to suggest that selecting different initializers influences the reproducibility of results. For instance, the default initializer is called "glorot_uniform". I played with some other distributions and managed to get more reproducible results, although with much higher error.

Since there are many variables, perhaps we should post, simple example here, e.g. single Dense layer, 1 input linear regression. The results should be consistent for all implementers. We can then confirm the results across different machines for different users.

pylang on 1 Apr 2017

I picked up mnist_cnn.py from the examples and set up keras.json in this way:

{
    "image_data_format": "channels_first",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

I ran python mnist_cnn.py a couple of times and the results did not seem to be reproducible.

Then I edited mnist_cnn.py and inserted the following code between from __future__ import print_function (line 8) and import keras (line 9):

import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)

The results now look _sufficiently_ reproducible to me. The small differences I assume are due to the use of cuDNN.

I tried running without cuDNN:
$ TF_USE_CUDNN=0 python mnist_cnn.py
but it seems it's not possible:
UnimplementedError (see above for traceback): Conv2D for GPU is not currently supported without cudnn

diogoff on 3 Apr 2017

👍3

If I switch the backend to Theano:

{
    "image_data_format": "channels_first",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}

and insert the following code between lines 8-9 in mnist_cnn.py:

import numpy as np
np.random.seed(123)

and then run:
$ THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py
the results are _fully_ reproducible.

diogoff on 3 Apr 2017

👍1

@diogoff for clarity, what do you consider fully reproducible? Do you know how close your loss results between runs? I'd like to compare notes.

pylang on 4 Apr 2017

With fully reproducible, I mean I always get exactly the same results in every run:

loss: 0.3336 - acc: 0.8981 - val_loss: 0.0788 - val_acc: 0.9759
loss: 0.1214 - acc: 0.9642 - val_loss: 0.0548 - val_acc: 0.9828
loss: 0.0893 - acc: 0.9733 - val_loss: 0.0443 - val_acc: 0.9847
loss: 0.0735 - acc: 0.9783 - val_loss: 0.0391 - val_acc: 0.9871
loss: 0.0666 - acc: 0.9804 - val_loss: 0.0363 - val_acc: 0.9872
loss: 0.0590 - acc: 0.9825 - val_loss: 0.0369 - val_acc: 0.9873
loss: 0.0542 - acc: 0.9836 - val_loss: 0.0338 - val_acc: 0.9889
loss: 0.0505 - acc: 0.9850 - val_loss: 0.0314 - val_acc: 0.9889
loss: 0.0467 - acc: 0.9861 - val_loss: 0.0299 - val_acc: 0.9896
loss: 0.0451 - acc: 0.9867 - val_loss: 0.0319 - val_acc: 0.9898
loss: 0.0421 - acc: 0.9874 - val_loss: 0.0297 - val_acc: 0.9894
loss: 0.0405 - acc: 0.9880 - val_loss: 0.0309 - val_acc: 0.9895
Test loss: 0.0309449151449    <-- exactly the same up to the last digit
Test accuracy: 0.9895

Keras 2.0.2, Theano 0.9.0 with libgpuarray, CUDA 8.0, cuDNN 5.1.10

diogoff on 4 Apr 2017

👍1

I have the same issue when using tensorflow on CPU, searched the solution online for 2 days and find this solved my issue: http://stackoverflow.com/questions/42412660/non-deterministic-gradient-computation

The reason for short is tensorflow using multiple threads or core to do the computation, and this would be a hidden issue when rounding the floating value and sharing it amount multiple threads.

To fix this, if you don't care about speed, just limit the thread tf using when creating a session:

sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1))

MorvanZhou on 8 Apr 2017

👍13 ❤5 🎉5

Can anyone solve the reproducibility issue for training recurrent layers?

KiranBaktha on 12 Apr 2017

❤1

Is there any update on this issue? I am experiencing it with the simple CIFAR10 example network, running on TF in GPU mode. Properly seeding both numpy.random.seed() and tf.set_random_seed() does not fix the issue. It seems that weight initialization is the same, but the weights diverge as training progresses.

It's going to be difficult to publish results with Keras+TF if the results are not reproducible.

blondon on 28 Apr 2017

👍9

I will focus on this issue.

I got a different value between validation accuracy and evaluation after loaded weights on the same dataset (validation data). So I cannot use the model trained anywhere.

wadefall on 28 Apr 2017

With the solution of @JacobIsrael123 my initialization (and thus the first loss using model.evaluate before fit) is the same. However, the accuracies and losses differ when fitting.
I am computing only on CPU. I turned shuffle = False in the fit function. Using theano my code produces the same results.

McLawrence on 5 May 2017

after trying all the above suggestions, the results are NOT reproducible even on cpu for keras+tensorflow.

jaiabhayk on 25 May 2017

Is this still open for tensorflow? Any updates from the keras 2.0 changes? Even if there is some kind of workaround, this is important to me. Very hard to debug without determinism.

gkericks on 31 May 2017

👍2

Theano should give reproducible results!!!! Are you getting the same accuracy atleast?

KiranBaktha on 1 Jun 2017

This configuration at the top of the code seems to work for me:
import numpy as np
import tensorflow as tf
import random as rn
np.random.seed(42)
rn.seed(12345)

single thread

session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)

from keras import backend as K
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

----ALSO set the shuffle=False in the fit call----

Note: This config forces single threaded operation. Allowing multithread seems to cause non-reproducible results (as pointed out above by @MorvanZhou). This is running in CPU mode.

td2014 on 1 Jun 2017

Doing what @td2014 mentioned (except for setting shuffle=False) didn't work, but once I added os.environ['PYTHONHASHSEED'] = '0' in addition to what he suggested, it worked! Setting the
PYTHONHASHSEED environment variable seems necessary for python3.

import numpy as np
import random as rn
import tensorflow as tf

# Setting PYTHONHASHSEED for determinism was not listed anywhere for TensorFlow,
# but apparently it is necessary for the Theano backend
# (https://github.com/fchollet/keras/issues/850).
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(7)
rn.seed(7)

# Limit operation to 1 thread for deterministic results.
session_conf = tf.ConfigProto(
    intra_op_parallelism_threads=1,
    inter_op_parallelism_threads=1
)

from keras import backend as K

tf.set_random_seed(7)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

[...rest of code...]

Edit: @wanting0wang seems to be correct below. Got ahead of myself.

abali96 on 8 Jun 2017

👍5

I tried the method suggested by @abali96 to reproduce my keras+TF model in Jupyter notebook. What I observed is that during a kernel's lifetime the training and evaluation can be reproduced.

That is to say, I can open a jupyter notebook, set seeds, fit the model, and the result (and training process) I get can be reproduced if I run the script again from the beginning. When I shutdown the kernel and reopen it, the results I get from the previous kernel cannot be reproduced. If I have several kernels running at the same time, results yielded from kernel A cannot be reproduced in kernel B even if they share exactly the same hyperparameters and seeds.

Anyone has the same problem or has any advice?

wanting0wang on 27 Jun 2017

👍3

I tried @abali96 method but still do not have reproducible results across runs either - each run under a new kernel. My system is configured to use libcudnn.so

>>> keras.__version__
'2.0.4'
>>> tensorflow.__version__
'1.1.0'

bitbionic on 14 Jul 2017

👍1

I believe this is known, but if I switch my code to use Theano as the backend, I need to set "conv.algo_bwd_data=deterministic" and "conv.algo_bwd_filter=deterministic" to achieve perfect reproducibility. I think there needs to be an equivalent option in Tensorflow, but I am not sure if there is one. If this is where the problem is coming from, then it isn't necessarily a Keras issue.

gkericks on 14 Jul 2017

👍1

I wonder if anyone solved this by trying tensorflow 1.3 with cudnn 6 ?

lef-fan on 24 Jul 2017

I am using keras with tensorflow backend. I have to use Tensorflow only, cant change to Teano backend. I am creating a simple 1 layer LSTM model. I need my code to give same val_loss every time I train on the same data. I am running my system on CPU only. I tried:

from numpy.random import seed
seed(1337)
from tensorflow import set_random_seed
set_random_seed(1337)

on the top of my code.
I also initialized all the kernels and recurrent as ones:

model.add(LSTM(150,input_shape=(None,124), W_regularizer=l2(0.001),kernel_initializer='ones', recurrent_initializer='ones', bias_initializer='ones'))
model.add(Dense(2,kernel_initializer='ones', bias_initializer='ones'))
model.add(Activation("softmax"))

I also set shuffle=False in model.fit(). I am using rms for optimizing.

I also set PYTHONHASHSEED to 0. But still I am getting different train loss as well as val loss on each epoch when rum multiple times.

I am running keras on a server with 56 cpu cores and CentOS.

Pls help soon as I have tried everything everyone has suggested in other threads !!!!!

sharod on 21 Aug 2017

+1

kevinkit on 6 Sep 2017

I've tested this toy script on both Ubuntu (with a GeForce GTX 1080 ti GPU) and a Macbook Pro (cpu only). While the least significant digits of the loss values differ across platforms the accuracy is consistent. I've run the script 10 times in a row on both platforms and see the same results each time.

The Ubuntu machine is running:

$ conda list | grep -i cud
cudatoolkit               8.0                           1
cudnn                     6.0.21                cuda8.0_0
tensorflow-gpu            1.2.1           py36cuda8.0cudnn6.0_0

Both machines are running:

$ python -V
Python 3.6.2 :: Anaconda custom

Here is the output from my Ubuntu machine. Here is the output from macOS.

I'm happy to try out other tests/suggestions if it would be helpful.

Looking4OffSwitch on 7 Sep 2017

Tensorflow is completely useless because of that issue. What's the reason to train models if you can't compare their performance because it's not reproducible?

zyavrik on 19 Sep 2017

👍5 😕1

Official answer to the question: "How can I obtain reproducible results using Keras during development?" - https://keras.io/getting-started/faq/ But it just doesn't work!

zyavrik on 19 Sep 2017

👍2

I also had a very similar if not the same one with TF backend. It was quite severe because perhaps partly due to my relatively small-sized model and dataset, the performance at the end significantly varied. I tried fixing tensorflow random seed, fixed init (later even with a pickled file for init weights), getting the same data sample provided, but still ended up different results. Seems like tensorflow's multithreading steps in somehow, since turning off the multiple thread of TF 'somehow' worked, not completely though (say, in a repeated experiment, 2/4 happened to be identical).

Moved to Theano, passed the THEANO flags that @diogoff mentioned ($ THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py), and everything seeems solved.

Interesting that it's not a huge problem for most of the people out there. Maybe people are using large enough datasets and the stochastic training process makes it less problemtic.

keunwoochoi on 10 Oct 2017

👍1

How about intializing it on cpu with a fixed seed, saving the params, and then porting it to GPU? Is that possible? I just have the thought of it, but cannot grab how to code it (if it works)

kevinkit on 10 Oct 2017

Yes that’s pretty much what I did by storing the weights in pickled file.

On 10Oct 2017, at 12:59, Kevin notifications@github.com wrote:

How about intializing it on cpu with a fixed seed, saving the params, and then porting it to GPU? Is that possible? I just have the thought of it, but cannot grab how to code it (if it works)

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2280#issuecomment-335449453, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xdERNfkj7-S_-Y20mrXhAO7gnNdmks5sq1wMgaJpZM4IFh2b.

keunwoochoi on 10 Oct 2017

I had this problem today and just using the above mentioned code, I mean this one:

import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)

solved the random results issue. it seems the issue with tensorflow has been fixed
I am using a GPU with CUDA 8.0.61 and CuDNN-6.0

VanitarNordic on 17 Oct 2017

and this link confirms it: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development

VanitarNordic on 17 Oct 2017

I copied the code given in the link at the top of my program, but I'm still having this issue.
I'm using a Jupyter notebook on a mac book, only when I restart the whole kernel I get the same result, so the first run always corresponds to the first run, the second to the second etc.
But every time I train it I get different results.

What I am doing wrong?

nyxjemk on 19 Oct 2017

😕3

Do you mean this link? https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development

if Yes, Does your data have some kind of normalized pattern? I mean consistent, and not scattered values.

VanitarNordic on 19 Oct 2017

Yes exactly this link.
I'm using a LSTM for sentence classification with a bag of words feature, so my input data is a vector consisting of integer values.
The only thing I do before passing it to the network is filling in zeros with sequence.pad_sequences().

(I'm not sure if this is relevant, but this was mentioned sometimes in the same context. I have a MacBook without extra GPU, so only onboard GPU without CUDA.)

nyxjemk on 19 Oct 2017

@nyxjemk I had exactly the same problem and managed to solve it by closing and restarting the tensorflow session every time I run the model. In your case it should look like this:

I ran the following code and had reproducible results using GPU and tensorflow backend:

print datetime.now()
for i in range(10):
    np.random.seed(0)
    tf.set_random_seed(0)
    sess = tf.Session(graph=tf.get_default_graph())
    K.set_session(sess)

    n_classes = 3
    n_epochs = 20
    batch_size = 128

    task = Input(shape = x.shape[1:])
    h = Dense(100, activation='relu', name='shared')(task)
    h1= Dense(100, activation='relu', name='single1')(h)
    output1 = Dense(n_classes, activation='softmax')(h1)

    model = Model(task, output1)
    model.compile(loss='categorical_crossentropy', optimizer='Adam')
    model.fit(x_train, y_train_onehot, batch_size = batch_size, epochs=n_epochs, verbose=0)
    print(model.evaluate(x=x_test, y=y_test_onehot, batch_size=batch_size, verbose=0))
    K.clear_session()

And obtained this output:

2017-10-23 11:27:14.494482
0.489712882132
0.489712893813
0.489712892765
0.489712854426
0.489712882132
0.489712864011
0.486303713004
0.489712903398
0.489712892765
0.489712903398

What I understood is that if you don't close your tf session (you are doing it by running in a new kernel) you keep sampling the same "seeded" distribution.

osmelu on 23 Oct 2017

👍2

still there is a inconsistency after the third or forth digit.

VanitarNordic on 27 Oct 2017

👍2

Yes, I'm using it for a ranking task. The small difference unfortunately makes a difference there.
Any further news on this topic?

nyxjemk on 22 Nov 2017

Just do it on CPU, no other option. speed does not make a huge difference in this case.

VanitarNordic on 22 Nov 2017

Mh.. I'm using CPU, still having this issue. Do I have to shut off multiprocessing to achieve the reproducibility? This seems not very handy, as it would take aeons to complete..

nyxjemk on 25 Nov 2017

Have you followed this, even in the code hierarchy.

https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development

VanitarNordic on 25 Nov 2017

Yes, I only left out the line with the multiprocessing.

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

I will try, but I think this will take just too long - not very practical. Using theano as backend results are reproducible without any limitations, so I thought there must be also a way to achieve this with tensor flow. But I will try it out.

nyxjemk on 25 Nov 2017

no, you can try this with tensorflow either, not just theano. test and inform us

VanitarNordic on 25 Nov 2017

With theano results are reproducible for me, but for some other reasons I'd like to stick with tensorflow.
As expected it takes way longer: ~10x for the system I'm using, I will test it with 1 epoch to see if results are the same after running it twice (even this is very slow..).
I think there must be a better way. As said above, with Theano as backend I'm getting fully reproducible results, with full multiprocessing support.
I think there should be also a way with TensorFlow achieving reproducible results with multiprocessing.. from my perspective this is a really huge handy cap, -90% performance can not be the way to go.

nyxjemk on 25 Nov 2017

the bad news is that theano will not continue its supports for the library. therefore we have no choice of sticking to tensorflow

VanitarNordic on 25 Nov 2017

yes.. therefore I was hoping there might be another way.
Adding the line:
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
does indeed solve the problem, but it is really really slow compared to normal. Too slow in my opinion to make often use of it.
Depending on the task there might be also more or less significant difference in the results on different runs.
So I hope there might be a future solution or maybe some workaround.

Is this problem TensorFlow specific, or is it just the combination of Keras and TF which leads to this issue?

nyxjemk on 25 Nov 2017

hey guys,

I can reproduce my results by adding np.random.seed(1) and rn.random.seed(1) with keras + theano. the latter probably is not even necessary.
Furthermore, I can reproduce my results with pure tensorflow, by just adding np.random.seed(1) and tf.set_random_seed(1) and with keras + cntk by just adding np.random.seed(1) and _cntk_py.set_fixed_random_seed(1).
This was done with and without a dropoutlayer; perfectly reproducible.

I went with the keras FAQ suggestion and tried to reproduce my results with keras + tf without success.
one interesting observation:
if I use only a small amount of data or a small amount of units (LSTM with 10 units), the results are almost reproducible with keras + tf. almost means, the last 5 digits do very. but as soon as I add more data or units (say 200), the results vary a lot more. I tried with dropout layer and without. same conclusion.
also, giving the seed directly to kernel initializer and/or dropout-layer did not make any difference regarding the reproducibility

in summary:
I tried the keras FAQ suggestion without sucess for keras + tf. pure tensorflow, keras + theano and keras + cntk are perfectly reproducible even with dropout involved.

theano: 0.9.0.dev
tf: 1.3.0
keras: 2.1.1
cntk: 2.3

all computations were done on gpu

stgrmks on 26 Nov 2017

👍1

I tried the same as you with keras+tensorflow with resnet-v2 model from keras applications with no success. Results are not reproducible.

zyavrik on 26 Nov 2017

any news?

stgrmks on 8 Jan 2018

Any progress with this besides using a single thread.

ankahira on 24 Jan 2018

This worked for me

# importing the libraries
import numpy as np
import tensorflow as tf
import random as rn

import os
os.environ['PYTHONHASHSEED'] = '0'

from keras import backend as k

# Running the below code every time
np.random.seed(27)
rn.seed(27)
tf.set_random_seed(27)

sess = tf.Session(graph=tf.get_default_graph())
k.set_session(sess)

## Creating model
m = Sequential()
m.add(...

## Compiling and fitting
m.compile(...
m.fit(...

bhardwajvijay on 18 Feb 2018

👍4 🎉2

it is not clear when Tensorflow will fix the damn thing, which still exists

VanitarNordic on 11 Mar 2018

👍4

This code works for me:

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1,
                        allow_soft_placement=True, device_count = {'CPU': 1})
session = tf.Session(config=config)
from keras import backend as K
K.set_session(session)

but it might make slowdown the learning?

hainguyenct on 23 Mar 2018

@VanitarNordic Hi, the code in Keras Documents works for me, but I still got small inconsistency after the third or forth digit as you said. When first several epoch, the inconsistency will become larger and larger, and two same code will become totally different in training values after about 10 epoch.
Do you have any idea how to solve this issue, or what raise this problem ?
Thank you for sharing your experience.

Sucran on 21 May 2018

@Sucran

Upgrade to the Tensorflow 1.8 and see what happens. You can use this Anaconda package:

conda install -c hesi_m keras

which installs TF-1.8 and Keras-2.1.6 for you on the CPU. Test and let me know the results.

VanitarNordic on 21 May 2018

@VanitarNordic the gpu software environment of my lab is Ubuntu 14.04, CUDA 8 and cudnn 6.0, I install tensorflow 1.4.0 and keras-2.1.6, and I could not update to tensorflow 1.8 because of software env limitation and I could not update the software env myself since others are using it. I am sorry for that...

Sucran on 21 May 2018

I don't think you could be able to make reproducible results using Tensorflow-GPU. I have also a GPU on the system but Tensorflow uses CPU because I have installed the CPU version. You can create another anaconda environment for this purpose and test the idea

VanitarNordic on 21 May 2018

@VanitarNordic Yes, I agree with you. It is difficult to make reproducible results using TF-GPU. In my cases, although two same code got loss values with small value difference in each epoch, the data curve is almost the same but not identical (may be we can already say this is a reproducible results). However, I
still hope someone could figure out what cases this issue. I think may be it could be identical in CPU just like you said. Thank you for giving these advises.

Sucran on 21 May 2018

@abali96 Do you have any idea why setting PYTHONHASHSEED would be necessary?

MartinThoma on 7 Aug 2018

@MartinThoma, setting the PYTHONHASHSEED environment variable to 0 ensures that python's built-in hash() function outputs the same result across multiple runs of the program (without this, the hash() function is only stable within a single run of the program). This hash() function is used everywhere, for example when you create a set or a dict. Try running this:

>>> set("abcdefghijklmnopqrstuvwxyz")
{'p', 'f', 'g', 'i', 'n', 'o', 'k', 'c', 'h', 'b', 'v', 'a', 'd', 's', 'u', 'q', 'j', 'z', 'm', 'r', 'w', 'l', 't', 'x', 'y', 'e'}
>>> set("abcdefghijklmnopqrstuvwxyz")
{'p', 'f', 'g', 'i', 'n', 'o', 'k', 'c', 'h', 'b', 'v', 'a', 'd', 's', 'u', 'q', 'j', 'z', 'm', 'r', 'w', 'l', 't', 'x', 'y', 'e'}

If I stop the Python shell and I run the same commands, I get a different result:

>>> set("abcdefghijklmnopqrstuvwxyz")
{'c', 'y', 'q', 'g', 'a', 'u', 'd', 'k', 'w', 'j', 'm', 's', 'e', 'o', 'b', 'h', 'l', 'r', 't', 'x', 'z', 'n', 'p', 'v', 'f', 'i'}
>>> set("abcdefghijklmnopqrstuvwxyz")
{'c', 'y', 'q', 'g', 'a', 'u', 'd', 'k', 'w', 'j', 'm', 's', 'e', 'o', 'b', 'h', 'l', 'r', 't', 'x', 'z', 'n', 'p', 'v', 'f', 'i'}

However, if I start python like this:

PYTHONHASHSEED=0 python

Then I always get the same result, even across multiple restarts of the Python shell:

>>> set("abcdefghijklmnopqrstuvwxyz")
{'x', 'i', 'r', 'p', 'd', 'c', 'l', 'y', 'h', 'm', 'z', 'k', 'o', 'a', 'g', 'f', 'u', 'e', 'w', 'n', 'b', 'q', 'j', 't', 's', 'v'}

However, I noticed that setting this environment variable within the Python program did not have any effect. It only worked when setting it before the program starts. Looking at Python's source code, it seems that this environment variable is read upon startup, so I think it's no use setting this environment variable after startup, I'll submit a fix to Keras's documentation.

Hope this helps,
Aurélien

ageron on 8 Aug 2018

I think this non-deterministic behaviour is pretty much not due to Keras, but Tensorflow itself (e.g., https://stackoverflow.com/questions/45865665/getting-reproducible-results-using-tensorflow-gpu?newreg=4a6ec43834884576a175961e7f2188db).

I have tried to run the pure Tensorflow code fully_connected_feed.py from Tensorflow repo with the following settings (as recommended above by other responses):

import tensorflow as tf
import numpy as np 
import random 
import os 
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(2019)
random.seed(2019)
tf.set_random_seed(2019)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
                              inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

and set shuffle=False in line 78 of fully_connected_feed.py but could not obtain reproducibility.

I could not obtain reproducibility even when runnning the code on CPU.
Note: For the Keras + Theano backend, I have also obtained a perfect reproducibility.

thanhnguyentang on 21 Nov 2018

👍1

I've also inserted the explicit kernel (and bias) initialization:
x = layers.Dense(64, activation='relu', kernel_initializer=keras.initializers.glorot_uniform(seed=123))(x),
and this has worked.

AE51 on 28 Nov 2018

I tried multiple version of tensorflow from 1.2.1 till 1.13.1 all of them have issues on CPU. However when I set the backend as CNTK I am able to get perfect matching results.
I have used Sequential model with only Dense layers and no dropouts.

mrgkumar on 3 Apr 2019

I'm getting this issue. I'm using this example https://www.depends-on-the-definition.com/lstm-with-char-embeddings-for-ner/ with the ner dataset. https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus

Everytime I run the training, it produces a completely different metrics for fscore. precision, and recall. THe lowest I had was 71% fscore and the highest was 83%. I think that is a huge variation.

I tried the same on two machines one with GPU and one using only CPU, and results are the same. unreproducable results.

I was using keras 2.2.2 + tensorflow 1.10 (unable to use latest version due to unresolved bug in keras >= 2.2.3)

erotavlas on 3 Apr 2019

Getting same issue +1

anjanaw on 9 Apr 2019

👍1

any update on this please?

anjanaw on 15 Apr 2019

👍1

Still can't get reproducible results even after the usage of this method

def _seed_everything(seed=2019):
    os.environ['PYTHONHASHSEED'] = str(seed) # Os
    random.seed(seed) # Python random
    np.random.seed(seed) # Numpy random
    set_random_seed(2019) # TF random

Please, consider it as high priority as it becomes really difficult to do research with TF + Keras.

alberduris on 19 Apr 2019

Hi @alberduris, please read my comment about PYTHONHASHSEED: you cannot set it within your program, you have to set it before starting Python (or Jupyter). Check out my video for more details.

ageron on 19 Apr 2019

👍3 🚀1 ❤1 🎉1

Putting the following code in the beginning, I can consistently reproduce the result 100% if I only use Dense layer.

import numpy as np
import random as rn
import tensorflow as tf
import os
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(1)
rn.seed(2)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from tensorflow.keras import backend as K
tf.set_random_seed(3)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)

However, I get different results if I insert this one line "model.add(Conv2D(32, 3, activation='relu'))" before "model.add(Flatten())".

Input> flatten > dense produces consistent result, but input > conv2d > flatten > dense produces different result every time I run the code.

I'd appreciate any guidance.

jsl303 on 5 May 2019

@jsl303 , it's no use setting PYTHONHASHSEED within the Python program, you have to set it before starting Python (it's only read by Python upon startup). This is used by Python to compute the hash (e.g., for dictionaries or sets). If you don't use any code that relies on the order of the items in sets or dictionaries, it won't make a difference, but I wouldn't count on it.

To convince yourself, try starting Python multiple times and run this command: print("".join(set("abcdefghijklmnopqrstuvwxyz"))). For example:

$ python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
oeqsytnmfprwbvhldxijzcugak
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
oeqsytnmfprwbvhldxijzcugak
>>> exit()

$ python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
qjufgnolbdewycpitkzvarxsmh
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
qjufgnolbdewycpitkzvarxsmh
>>> exit()

As you can see, although the order is consistent within one Python execution, it is not consistent across multiple runs. If you try to set PYTHONHASHSEED within your Python code, it won't be any different, git it a try! But the correct approach is to set PYTHONHASHSEED before Python starts, for example on the command line:

$ PYTHONHASHSEED=0 python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> exit()

$ PYTHONHASHSEED=0 python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> exit()

Hope this helps.

ageron on 5 May 2019

Thanks for the explanation!

I don't use any sets or dictionaries in my code. I tried what you
suggested anyways in case. You're right. it did not make any difference.

I still get different results whenever I rerun the model with a Conv2D.

jsl303 on 5 May 2019

@jsl303 , glad I could help. But even if you don't use any sets or dictionaries in your code, if any library you call iterates over sets or dictionaries, the result will not be deterministic across runs. So I really recommend setting PYTHONHASHSEED=0 outside of Python. Moreover, if you are using a GPU, then it won't be deterministic because some GPU operations used by TensorFlow (through CuDNN and CUDA) are just not perfectly deterministic (such as tf.reduce_sum()). So unfortunately, if you really need things to be deterministic, you need to use the CPU. And you also need to use a single thread. Check out my video on this topic.

ageron on 5 May 2019

I am getting 100% reproducible results after following Ageron's video. Below are the variables I set

OS: Windows
Libraries used: Keras (Tensorflow in the backend)
IDE: Anaconda

I first created a PYTHONHASHSEED environment variable in Windows environment variables and set it to 0.

I opened jupyter notebook through Anaconda prompt and added the below code at the start of the program.

import numpy as np
import tensorflow as tf
import random as rn

np.random.seed(10)
rn.seed(10)

config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
with tf.Session(config=config) as sess:
pass

tf.set_random_seed(10)

I then used the seed while splitting dataset, initializing weights in NN's .
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30,random_state=random_seed)

model.add(Dense(45, input_dim=42, activation='tanh',kernel_initializer=keras.initializers.glorot_normal(seed=random_seed)))

Hope this helps.

The only problem I see now is to reproduce the results, I have to restart the kernel every time. I hope someone can help in this regard which avoids restarting the kernel.

Update:
I replaced
with tf.Session(config=config) as sess:
pass
tf.set_random_seed(10)

with the below in the first cell and now I don't have to restart the kernel.

from keras import backend as K
tf.set_random_seed(10)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)

Just running this cell before the desired seed iteration helps.

-Kiran Varma

kiranvarmas on 15 May 2019

🎉2

After searching for 2-3 days, a solution that I saw somewhere worked for me. Changed the optimizer to Adagrad from Adam and now I am having consistent results. Trying to find the reason for this

samra-irshad on 10 Sep 2019

👍3

Keras: No reproducible using tensorflow backend

Most helpful comment

All 91 comments

single thread

Related issues