Cntk: Non Reproducible results even after setting the seed value (Python API )

Created on 5 Jun 2017  路  20Comments  路  Source: microsoft/CNTK

Previously, I mentioned that there aren't any options available to define a seed value in Branscripts for CNTK sequential machine learning models[1]. Hence I migrated my code to Python API (CNTK), which gives more fine-grained options when defining the seed values of sequential machine learning models. Below are the instances that I have used random initialization in my implementation (and set the corresponding seed value as well)

// CNTK imports

import numpy as np
import pandas as pd
import random
import math as m

from cntk.device import *
from cntk import Trainer
from cntk.layers import * 

import cntk
import cntk.ops as o
import cntk.layers as l

//defining the random seed

np.random.seed(8888)
random.seed(8888)

// Defining input and output training vectors

input_array_df = np.asarray(input_split_df[1:len(input_split_df)], dtype=np.float32)
output_array_df = np.asarray(output_df_df[1:len(output_df_df)], dtype=np.float32)
tup=(input_array_df, output_array_df)
listOfTuplesOfInputsLabels.append(tup)

//shuffling the input vector
random.shuffle(listOfTuplesOfInputsLabels)

//Defining sequential model

num_minibatches = len(features) // minibatch_size
    epoch_size = len(features)*1

    feature = o.input_variable((input_dim),np.float32)
    label = o.input_variable((output_dim),np.float32)

    netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(seed=8888)))),Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=8888))])(feature)

    learner = momentum_sgd(netout.parameters, lr = learning_rate_schedule([(4,0.003),(16,0.002)], unit=UnitType.sample,epoch_size=epoch_size),
                               momentum=momentum_as_time_constant_schedule(minibatch_size / -m.log(0.9)), gaussian_noise_injection_std_dev = gaussian_noise,l2_regularization_weight =l2_regularization_weight)

//Splitting into mini batches

tf = np.array_split(features,num_minibatches)
tl = np.array_split(labels,num_minibatches)

//Train

features = np.ascontiguousarray(tf[i%num_minibatches])
labels = np.ascontiguousarray(tl[i%num_minibatches])
trainer.train_minibatch({feature : features, label : labels})

Unfortunately, even though I was able to successfully define the seed value in my code, I could still observe some smaller variations in my final result. Is this because of the floating point calculations? or could you find anything in my code that I should have set the seed value, which I haven't done it already?

Thanks !

[1] https://github.com/Microsoft/CNTK/issues/1777

All 20 comments

Hi @kasungayan,

Your code set the seed for numpy and the global Python random number generator but not for CNTK internal generator. Adding this might help:

from _cntk_py import set_fixed_random_seed

Also, for strong reproducibility, you might need to use deterministic algorithms:

from _cntk_py import set_fixed_random_seed, force_deterministic_algorithms
set_fixed_random_seed(1)
force_deterministic_algorithms()

Hope it helps,
Morgan

Hi @mfuntowicz ,

Thanks a lot for your response. Really appreciate it! As you stated, I set the seed value for CNTK internal generator as below;

// Sequential Model

 netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(**seed=set_fixed_random_seed(1)**)))),
                         Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=**set_fixed_random_seed(1)**))])(feature)

//Main method ( I have used the same CNTK internal generator to set the seed value of numpy and the global Python random number generator as well)

np.random.seed(set_fixed_random_seed(1))
random.seed(set_fixed_random_seed(1))

force_deterministic_algorithms()

But unfortunately, it still prompts up some variances in my final result. Have I done something wrong in this code?

Regards,
Kasun.

Max pooling is not deterministic if you have overlapped pooling. This will be fixed after we integrate cuDNN 6.

Hi @cha-zhang,

Thanks for the response. I haven't use any max pooling parameters in my code. You can refer it from [1]. I have used the LSTM as my training model, Does LSTM consider as a non deterministic algorithm?

Regards,
Kasun.

[1] https://github.com/duanhu/test-repo/pull/1/commits/b647d11b9c7f75a7615ce6aefd3047e022eedc81?diff=unified

Should not be.

I don't think you need to set seed every time you call a random number generator.

You should call cntk.cntk_py.set_fixed_random_seed(1) only once. Note you should not be using _cntk_py.

Thanks again for the prompt response @cha-zhang . As you stated I removed the __cntk_py_ import _(from _cntk_py import set_fixed_random_seed, force_deterministic_algorithms)_ and defined the seed values explicitly as below

//in the learner

netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(seed=1)))),
 Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=1))])(feature)

//under main method

set_default_device(cpu())
np.random.seed(1)
random.seed(1)
cntk.cntk_py.set_fixed_random_seed(1)
cntk.cntk_py.force_deterministic_algorithms()

Eventhough this has reduced the initial variance of the results comparatively, still i could observe small variances in my final output matrix.

Am I still missing something here? or do I need to define the seed value in LSTM as _init=glorot_uniform(seed=cntk.cntk_py.set_fixed_random_seed(1))_ as well ?

Regards,
Kasun.

Hi @cha-zhang ,

I would be more than grateful if you could comment on the approach that I have followed above?

Thanks in advance :-)

Regards,
Kasun.

@kasungayan I have asked some colleagues to help take a look. Stay tuned.

@kasungayan Could you please share a self-contained repro? The script linked above does not have input files.

Never mind, changed the create_train_data() to generate stub input as follows:

num_minibatches = 20
minibatch_size = 128

def create_train_data():
    return np.random.rand(minibatch_size*num_minibatches, input_dim), \
        np.random.rand(minibatch_size*num_minibatches, output_dim), \
        np.random.rand(minibatch_size, input_dim)

I don't see any non-determinism. Output is identical in each re-run.

One potential issue in your code is that you have things like random.seed(set_fixed_random_seed(1)), which are essentially no-ops, since set_fixed_random_seed(1) is void.

Hi @raaaar ,

Thanks for the response and thanks @cha-zhang for directing. Sorry if I have misunderstood it incorrectly. Correct me if I'm wrong, so create_train_data() must be returning its values as follows?. And would you mind giving a brief explanation for the root cause of this issue?

return listOfInputs, listOfLabels, listOfTestInputs, np.random.rand(minibatch_size*num_minibatches, input_dim), np.random.rand(minibatch_size*num_minibatches, output_dim), np.random.rand(minibatch_size, input_dim)

Thanks in advance.

Regards,
Kasun

I had to change create_train_data() to return some "fake" input since I didn't have your original input files. That was only necessary to be able to run your script to try to reproduce non-determinism that you observed.

The root cause is that you never properly set rnd seeds in your code. Since set_fixed_random_seed is void, random.seed(set_fixed_random_seed(1)) is the same as random.seed(None), which does nothing. Here's a quick example:

>>> random.seed(1), [random.randint(1, 100) for _ in range(10)]
(None, [18, 73, 98, 9, 33, 16, 64, 98, 58, 61])
>>> random.seed(1), [random.randint(1, 100) for _ in range(10)]
(None, [18, 73, 98, 9, 33, 16, 64, 98, 58, 61])
>>> random.seed(None), [random.randint(1, 100) for _ in range(10)]
(None, [38, 3, 25, 8, 10, 17, 67, 85, 57, 80])
>>> random.seed(None), [random.randint(1, 100) for _ in range(10)]
(None, [32, 55, 21, 36, 41, 11, 97, 46, 65, 84])

Notice that the first two randomly generated sequences are identical, but the last two are different. So, once I fixed your script to properly set seeds, it started to produce identical output in each new run.

Hi @raaaar ,

Thanks again !. Well, as a matter of fact, i have fixed the issue you have mentioned regarding setting the _rnd_ seed in my code as per the previous comments gave by @cha-zhang. But the issue was still reproducible. As you requested I have created and pushed my code into a self-contained repo as in [1]. You can find the relevant code and the dataset. Really appreciate your assistance and time :-)

Thanks in advance!

Regards,
Kasun.

[1] https://github.com/kasungayan/CNTK-LSTM

Hi @raaaar ,

Did you able to reproduce the issue that I have mentioned? I have created and pushed my code into a self-contained repo as in [1]. You can find the relevant code and the dataset. Really appreciate your assistance and time :-)

Thanks in advance!

Regards,
Kasun.

[1] https://github.com/kasungayan/CNTK-LSTM

Hey, I did try you repro (trained two times for 1 epoch each) and didn't see any difference in the output. I think you might be running a stale release (your code uses some deprecated API). Could you try upgrading to the latest CNTK version and verifying that the issue is still present?

Hi @raaaar ,

I'm using the CNTK-2-0rc2 version. Sure, will test this with the latest CNTK version. Thanks for the response!

Regards,
Kasun.

Hi @raaaar ,

I tried to run my code in CNTK-2.0, but it gives me the following error.

```
File "cluster1.py", line 183, in
testshape = train_model(features, labels,tests)
File "cluster1.py", line 124, in train_model
Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=1))])(feature)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\ops\functions.py", line 384, in __call__
return self.clone(CloneMethod.share, arg_map)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\internal\swig_helper.py", line 69, in wrapper
result = f(args, *kwds)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\ops\functions.py", line 557, in clone
return super(Function, self).clone(method, substitutions)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\cntk_py.py", line 1597, in clone
return _cntk_py.Function_clone(self, *args)
untimeError: PastValue/FutureValue Function 'PastValue: Output('Block1088_Outpu
_0', [#], [20]) -> Output('PastValue1003_Output_0', [???], [???])': Input opera
d 'Output('Block1088_Output_0', [#], [20])' with #dynamic axes != 2 (1 sequence
axis and 1 batch axis) is not supported.

```
As you have mentioned in the previous comment, I change some of the deprecated APIs as per [1]

epoch_size-->max_samples
set_default_device(cpu())->cntk.device.try_set_default_device(cpu())

I guess this error prompts due to the deprecated APIs that I'm using. If so, would you mind pointing those functions to me, so I could migrate my code.

Thanks in advance!

Regards,
Kasun.

[1] https://docs.microsoft.com/en-us/cognitive-toolkit/ReleaseNotes/CNTK_2_0_RC_3_Release_Notes

Hi @raaaar ,

Did you run my code in CNTK-2.0? If so, would you mind sharing the migrated code? I just want to figure out the deprecated APIs that I have used in my original code.

Thanks in advance!

Regards,
Kasun.

Hi @kasungayan,

This is exactly the same error as: #1925.

It's basically that input doesn't have sequence axis right now. You need to use sequence.input()

Morgan

Hi @mfuntowicz,

The issue got resolved by using sequence.input(). Thanks for the support. @raaaar after migrating my code to CNTK 2.0, I was able to reproduce my results. Thanks, @mfuntowicz, @raaaar and @cha-zhang for your guidance and support!.

Regards,
Kasun.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yujianyuanhaha picture yujianyuanhaha  路  4Comments

pallashadow picture pallashadow  路  5Comments

SudharakaP picture SudharakaP  路  5Comments

christopher5106 picture christopher5106  路  5Comments

chrispugmire picture chrispugmire  路  3Comments