Incubator-mxnet: mxnet random seed does not work for mx.init.Xavier on both CPU and GPU

Created on 15 Nov 2017  路  5Comments  路  Source: apache/incubator-mxnet

Description

mx.random.seed does not work for mx.init.Xavier on both CPU and GPU

Environment info (Required)

----------Python Info----------
Version      : 3.5.2
Compiler     : GCC 5.4.0 20160609
Build        : ('default', 'Sep 14 2017 22:51:06')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 8.1.1
Directory    : /usr/lib/python3/dist-packages/pip
----------MXNet Info-----------
Version      : 0.12.0
Directory    : /usr/local/lib/python3.5/dist-packages/mxnet-0.12.0-py3.5.egg/mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Linux-4.4.0-100-generic-x86_64-with-Ubuntu-16.04-xenial
system       : Linux
node         : xxxxx
release      : 4.4.0-100-generic
version      : #123-Ubuntu SMP Thu Nov 2 10:16:13 UTC 2017
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Model name:            Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
Stepping:              9
CPU MHz:               3701.882
CPU max MHz:           3900.0000
CPU min MHz:           1600.0000
BogoMIPS:              6784.34
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0030 sec, LOAD: 0.0555 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0010 sec, LOAD: 0.7034 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0013 sec, LOAD: 0.2217 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0012 sec, LOAD: 0.6687 sec.
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0010 sec, LOAD: 1.3287 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0042 sec, LOAD: 0.0200 sec.

Package used (Python/R/Scala/Julia):
(1) I am using numpy 1.13.3
(2) CUDA 9.0
(3) cuDNN 5.0.3
(4) OpenBLAS 0.2.10
(5) mxnet 0.12 from github
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet --branch 0.12.0

Build info (Required if built from source)

mxnet build with cuda, cudnn, openblas, openmp disabled, opencv disabled

Compiler (gcc/clang/mingw/visual studio):

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.5' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5) 

MXNet commit hash:
4f2af2d2e5216ab3a1faadcc117709b6836029dc

Build config:

USE_CUDA=1
USE_OPENBLAS=openblas
USE_OPENCV=0
USE_CUDNN=1
USE_OPENMP=0
USE_GPERFTOOLS = 1
USE_JEMALLOC = 1

Minimum reproducible example

#!/usr/bin/python3
import mxnet as mx
import numpy as np

for device in ['cpu','gpu']:
    with mx.Context(device):
        np.random.seed(0)
        mx.random.seed(0)

        x = mx.sym.Variable('x')
        L1 = mx.sym.FullyConnected(data=x,num_hidden=100,flatten=False)
        L2 = mx.sym.FullyConnected(data=L1,num_hidden=100,flatten=False)
        y = mx.sym.FullyConnected(data=L2,num_hidden=1,flatten=False)

        mod = mx.mod.Module(y,data_names=["x"],label_names=None)
        mod.bind(data_shapes=[("x",(1,1))])

        mod.init_params(initializer=mx.init.Xavier(rnd_type='gaussian'))
        #mod.init_params(initializer=mx.init.One())
        one = mx.io.DataBatch(data=[
            mx.nd.array(np.random.rand(1).reshape(1,1))
        ])

        mod.forward(one)
        output = mod.get_outputs()[0]
        output = output.asnumpy()
        print("[%s] Random from numpy=%g, from mxnet=%g" % (device,np.random.rand(),output.flatten()[0]))

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. python3 (the above script for several time), the output would be:
$ python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=0.216065
[gpu] Random from numpy=0.715189, from mxnet=0.214543
$ python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=-0.320229
[gpu] Random from numpy=0.715189, from mxnet=0.163189
$ python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=-0.320229
[gpu] Random from numpy=0.715189, from mxnet=-0.192892
$ 

What have you tried to solve it?

  1. Do not use mod.init_params(initializer=mx.init.Xavier(rnd_type='gaussian')), but using mod.init_params(initializer=mx.init.One()) then the result is deterministic. But I need xavier..

  2. I also tested to disable multithreading OpenBLAS:

$ OPENBLAS_NUM_THREADS=1 python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=-0.415832
[gpu] Random from numpy=0.715189, from mxnet=-0.629229
$ OPENBLAS_NUM_THREADS=1 python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=0.206878
[gpu] Random from numpy=0.715189, from mxnet=0.214543
$ OPENBLAS_NUM_THREADS=1 python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=0.206878
[gpu] Random from numpy=0.715189, from mxnet=1.27156
$ OPENBLAS_NUM_THREADS=1 python3 test_random.py
[cpu] Random from numpy=0.715189, from mxnet=0.216065
[gpu] Random from numpy=0.715189, from mxnet=0.214543
$

Bug

Most helpful comment

I understand why mxnet went wrong:
I am using python3 and hash randomization is enabled:

In mxnet/python/mxnet/module/module.py

    attrs = self._symbol.attr_dict()
    for name, arr in self._arg_params.items():
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, arg_params)

    for name, arr in self._aux_params.items():
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, aux_params)

should be replaced by:

    attrs = self._symbol.attr_dict()
    for name, arr in sorted(self._arg_params.items()):
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, arg_params)

    for name, arr in sorted(self._aux_params.items()):
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, aux_params)

All 5 comments

I understand why mxnet went wrong:
I am using python3 and hash randomization is enabled:

In mxnet/python/mxnet/module/module.py

    attrs = self._symbol.attr_dict()
    for name, arr in self._arg_params.items():
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, arg_params)

    for name, arr in self._aux_params.items():
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, aux_params)

should be replaced by:

    attrs = self._symbol.attr_dict()
    for name, arr in sorted(self._arg_params.items()):
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, arg_params)

    for name, arr in sorted(self._aux_params.items()):
        desc = InitDesc(name, attrs.get(name, None))
        _impl(desc, arr, aux_params)

Good catch! Do you want to post a PR to fix this?

I have Python 2 and also have issues with initialization in that the results are different for the same seeds. I can't confirm it's the same issue though.
Update: tried Ones, and still get different results (but the magnitude of the difference is less), so the issue might be with the gluon.Trainer

@eric-haibin-lin Please tag as "Bug", "Call For Contribution"

Was this page helpful?
0 / 5 - 0 ratings