Previously, I mentioned that there aren't any options available to define a seed value in Branscripts for CNTK sequential machine learning models[1]. Hence I migrated my code to Python API (CNTK), which gives more fine-grained options when defining the seed values of sequential machine learning models. Below are the instances that I have used random initialization in my implementation (and set the corresponding seed value as well)
// CNTK imports
import numpy as np
import pandas as pd
import random
import math as m
from cntk.device import *
from cntk import Trainer
from cntk.layers import *
import cntk
import cntk.ops as o
import cntk.layers as l
//defining the random seed
np.random.seed(8888)
random.seed(8888)
// Defining input and output training vectors
input_array_df = np.asarray(input_split_df[1:len(input_split_df)], dtype=np.float32)
output_array_df = np.asarray(output_df_df[1:len(output_df_df)], dtype=np.float32)
tup=(input_array_df, output_array_df)
listOfTuplesOfInputsLabels.append(tup)
//shuffling the input vector
random.shuffle(listOfTuplesOfInputsLabels)
//Defining sequential model
num_minibatches = len(features) // minibatch_size
epoch_size = len(features)*1
feature = o.input_variable((input_dim),np.float32)
label = o.input_variable((output_dim),np.float32)
netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(seed=8888)))),Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=8888))])(feature)
learner = momentum_sgd(netout.parameters, lr = learning_rate_schedule([(4,0.003),(16,0.002)], unit=UnitType.sample,epoch_size=epoch_size),
momentum=momentum_as_time_constant_schedule(minibatch_size / -m.log(0.9)), gaussian_noise_injection_std_dev = gaussian_noise,l2_regularization_weight =l2_regularization_weight)
//Splitting into mini batches
tf = np.array_split(features,num_minibatches)
tl = np.array_split(labels,num_minibatches)
//Train
features = np.ascontiguousarray(tf[i%num_minibatches])
labels = np.ascontiguousarray(tl[i%num_minibatches])
trainer.train_minibatch({feature : features, label : labels})
Unfortunately, even though I was able to successfully define the seed value in my code, I could still observe some smaller variations in my final result. Is this because of the floating point calculations? or could you find anything in my code that I should have set the seed value, which I haven't done it already?
Thanks !
Hi @kasungayan,
Your code set the seed for numpy and the global Python random number generator but not for CNTK internal generator. Adding this might help:
from _cntk_py import set_fixed_random_seed
Also, for strong reproducibility, you might need to use deterministic algorithms:
from _cntk_py import set_fixed_random_seed, force_deterministic_algorithms
set_fixed_random_seed(1)
force_deterministic_algorithms()
Hope it helps,
Morgan
Hi @mfuntowicz ,
Thanks a lot for your response. Really appreciate it! As you stated, I set the seed value for CNTK internal generator as below;
// Sequential Model
netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(**seed=set_fixed_random_seed(1)**)))),
Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=**set_fixed_random_seed(1)**))])(feature)
//Main method ( I have used the same CNTK internal generator to set the seed value of numpy and the global Python random number generator as well)
np.random.seed(set_fixed_random_seed(1))
random.seed(set_fixed_random_seed(1))
force_deterministic_algorithms()
But unfortunately, it still prompts up some variances in my final result. Have I done something wrong in this code?
Regards,
Kasun.
Max pooling is not deterministic if you have overlapped pooling. This will be fixed after we integrate cuDNN 6.
Hi @cha-zhang,
Thanks for the response. I haven't use any max pooling parameters in my code. You can refer it from [1]. I have used the LSTM as my training model, Does LSTM consider as a non deterministic algorithm?
Regards,
Kasun.
Should not be.
I don't think you need to set seed every time you call a random number generator.
You should call cntk.cntk_py.set_fixed_random_seed(1) only once. Note you should not be using _cntk_py.
Thanks again for the prompt response @cha-zhang . As you stated I removed the __cntk_py_ import _(from _cntk_py import set_fixed_random_seed, force_deterministic_algorithms)_ and defined the seed values explicitly as below
//in the learner
netout=Sequential([For(range(1), lambda i: Recurrence(LSTM(lstm_cell_dimension,use_peepholes=LSTM_USE_PEEPHOLES,init=glorot_uniform(seed=1)))),
Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=1))])(feature)
//under main method
set_default_device(cpu())
np.random.seed(1)
random.seed(1)
cntk.cntk_py.set_fixed_random_seed(1)
cntk.cntk_py.force_deterministic_algorithms()
Eventhough this has reduced the initial variance of the results comparatively, still i could observe small variances in my final output matrix.
Am I still missing something here? or do I need to define the seed value in LSTM as _init=glorot_uniform(seed=cntk.cntk_py.set_fixed_random_seed(1))_ as well ?
Regards,
Kasun.
Hi @cha-zhang ,
I would be more than grateful if you could comment on the approach that I have followed above?
Thanks in advance :-)
Regards,
Kasun.
@kasungayan I have asked some colleagues to help take a look. Stay tuned.
@kasungayan Could you please share a self-contained repro? The script linked above does not have input files.
Never mind, changed the create_train_data() to generate stub input as follows:
num_minibatches = 20
minibatch_size = 128
def create_train_data():
return np.random.rand(minibatch_size*num_minibatches, input_dim), \
np.random.rand(minibatch_size*num_minibatches, output_dim), \
np.random.rand(minibatch_size, input_dim)
I don't see any non-determinism. Output is identical in each re-run.
One potential issue in your code is that you have things like random.seed(set_fixed_random_seed(1)), which are essentially no-ops, since set_fixed_random_seed(1) is void.
Hi @raaaar ,
Thanks for the response and thanks @cha-zhang for directing. Sorry if I have misunderstood it incorrectly. Correct me if I'm wrong, so create_train_data() must be returning its values as follows?. And would you mind giving a brief explanation for the root cause of this issue?
return listOfInputs, listOfLabels, listOfTestInputs,
np.random.rand(minibatch_size*num_minibatches, input_dim), np.random.rand(minibatch_size*num_minibatches, output_dim),
np.random.rand(minibatch_size, input_dim)
Thanks in advance.
Regards,
Kasun
I had to change create_train_data() to return some "fake" input since I didn't have your original input files. That was only necessary to be able to run your script to try to reproduce non-determinism that you observed.
The root cause is that you never properly set rnd seeds in your code. Since set_fixed_random_seed is void, random.seed(set_fixed_random_seed(1)) is the same as random.seed(None), which does nothing. Here's a quick example:
>>> random.seed(1), [random.randint(1, 100) for _ in range(10)]
(None, [18, 73, 98, 9, 33, 16, 64, 98, 58, 61])
>>> random.seed(1), [random.randint(1, 100) for _ in range(10)]
(None, [18, 73, 98, 9, 33, 16, 64, 98, 58, 61])
>>> random.seed(None), [random.randint(1, 100) for _ in range(10)]
(None, [38, 3, 25, 8, 10, 17, 67, 85, 57, 80])
>>> random.seed(None), [random.randint(1, 100) for _ in range(10)]
(None, [32, 55, 21, 36, 41, 11, 97, 46, 65, 84])
Notice that the first two randomly generated sequences are identical, but the last two are different. So, once I fixed your script to properly set seeds, it started to produce identical output in each new run.
Hi @raaaar ,
Thanks again !. Well, as a matter of fact, i have fixed the issue you have mentioned regarding setting the _rnd_ seed in my code as per the previous comments gave by @cha-zhang. But the issue was still reproducible. As you requested I have created and pushed my code into a self-contained repo as in [1]. You can find the relevant code and the dataset. Really appreciate your assistance and time :-)
Thanks in advance!
Regards,
Kasun.
Hi @raaaar ,
Did you able to reproduce the issue that I have mentioned? I have created and pushed my code into a self-contained repo as in [1]. You can find the relevant code and the dataset. Really appreciate your assistance and time :-)
Thanks in advance!
Regards,
Kasun.
Hey, I did try you repro (trained two times for 1 epoch each) and didn't see any difference in the output. I think you might be running a stale release (your code uses some deprecated API). Could you try upgrading to the latest CNTK version and verifying that the issue is still present?
Hi @raaaar ,
I'm using the CNTK-2-0rc2 version. Sure, will test this with the latest CNTK version. Thanks for the response!
Regards,
Kasun.
Hi @raaaar ,
I tried to run my code in CNTK-2.0, but it gives me the following error.
```
File "cluster1.py", line 183, in
testshape = train_model(features, labels,tests)
File "cluster1.py", line 124, in train_model
Dense(output_dim,bias=BIAS,init=glorot_uniform(seed=1))])(feature)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\ops\functions.py", line 384, in __call__
return self.clone(CloneMethod.share, arg_map)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\internal\swig_helper.py", line 69, in wrapper
result = f(args, *kwds)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\ops\functions.py", line 557, in clone
return super(Function, self).clone(method, substitutions)
File "C:\local\Anaconda3-4.1.1-Windows-x86_64\envs\cntk-py35\lib\site-packages
cntk\cntk_py.py", line 1597, in clone
return _cntk_py.Function_clone(self, *args)
untimeError: PastValue/FutureValue Function 'PastValue: Output('Block1088_Outpu
_0', [#], [20]) -> Output('PastValue1003_Output_0', [???], [???])': Input opera
d 'Output('Block1088_Output_0', [#], [20])' with #dynamic axes != 2 (1 sequence
axis and 1 batch axis) is not supported.
```
As you have mentioned in the previous comment, I change some of the deprecated APIs as per [1]
epoch_size-->max_samples
set_default_device(cpu())->cntk.device.try_set_default_device(cpu())
I guess this error prompts due to the deprecated APIs that I'm using. If so, would you mind pointing those functions to me, so I could migrate my code.
Thanks in advance!
Regards,
Kasun.
[1] https://docs.microsoft.com/en-us/cognitive-toolkit/ReleaseNotes/CNTK_2_0_RC_3_Release_Notes
Hi @raaaar ,
Did you run my code in CNTK-2.0? If so, would you mind sharing the migrated code? I just want to figure out the deprecated APIs that I have used in my original code.
Thanks in advance!
Regards,
Kasun.
Hi @kasungayan,
This is exactly the same error as: #1925.
It's basically that input doesn't have sequence axis right now. You need to use sequence.input()
Morgan
Hi @mfuntowicz,
The issue got resolved by using sequence.input(). Thanks for the support. @raaaar after migrating my code to CNTK 2.0, I was able to reproduce my results. Thanks, @mfuntowicz, @raaaar and @cha-zhang for your guidance and support!.
Regards,
Kasun.