Incubator-mxnet: assertion error in _finished_deferred_init() in mxnet.gluon.parameter

Created on 13 Apr 2019  路  9Comments  路  Source: apache/incubator-mxnet

Description

I find this error when running the following code:

net = mx.gluon.rnn.GRUCell(hidden_size=hidden_size,
                           input_size=input_size,
                           prefix='gru_')
net.initialize()
data = nd.zeros(shape=(seq_len, batch_size, in_channel))
output = net.unroll(length=seq_len,
                    inputs=data,
                    layout='TNC',
                    merge_outputs=True)

in my case the hidden_size is very big (e.g., 1000000), since allow_deferred_init is set to be True in GRUCell by default, it will call the 274 line of mxnet/gluon/parameter

def _finish_deferred_init(self):
        """Finishes deferred initialization."""
        if not self._deferred_init:
            return
        init, ctx, default_init, data = self._deferred_init
        self._deferred_init = ()
        assert self.shape is not None and np.prod(self.shape) > 0, \
            "Cannot initialize Parameter '%s' because it has " \
            "invalid shape: %s. Please specify in_units, " \
            "in_channels, etc for `Block`s."%(
                self.name, str(self.shape))

np.prod((3000000, 1000000)) > 0 will overflow and give False.

@sxjscience @szha

Bug Gluon

All 9 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Gluon, Bug

@gaozhihan thanks for reporting this. It sounds more like a bug in numpy to me. Also, I tested locally just now and found that numpy is returning correct result in my case.

% ipython
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import numpy as np

In [2]: np.prod((3000000, 1000000)) > 0
Out[2]: True

% ipython2
Python 2.7.15 (default, Jan 12 2019, 21:43:48)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.

In [1]: import numpy as np

In [2]: np.prod((3000000, 1000000)) > 0
Out[2]: True

It would be great if we could make sure mxnet doesn't depend on the problematic version. Could you help report the environment you're using?

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

Thanks for your reply. I was running simple test case on Windows platform. My numpy version is '1.14.6'

Python 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: %run diagnose.py
----------Python Info----------
Version      : 3.7.1
Compiler     : MSC v.1915 64 bit (AMD64)
Build        : ('default', 'Dec 10 2018 22:54:23')
Arch         : ('64bit', 'WindowsPE')
------------Pip Info-----------
Version      : 18.1
Directory    : C:\Anaconda3\lib\site-packages\pip
----------MXNet Info-----------
Version      : 1.4.0
Directory    : C:\Anaconda3\lib\site-packages\mxnet
Hashtag not found. Not installed from pre-built package.
----------System Info----------
Platform     : Windows-10-10.0.17134-SP0
system       : Windows
node         : DESKTOP-8A502S8
release      : 10
version      : 10.0.17134
----------Hardware Info----------
machine      : AMD64
processor    : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
Name
Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz

----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0160 sec, LOAD: 0.9976 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.2839 sec, LOAD: 0.1510 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.3168 sec, LOAD: 0.7885 sec.
Error open FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1051)>, DNS finished in 0.31090235710144043 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0080 sec, LOAD: 0.8505 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0580 sec, LOAD: 0.0610 sec.

In [3]: import numpy as np

In [4]: np.prod((300000,100000)) > 0
Out[4]: False

In [5]: np.__version__
Out[5]: '1.14.6'

@gaozhihan I'm using the exact same numpy version on mac.

In [3]: np.__version__
Out[3]: '1.14.6'

@zhreshold @yajiedesign what do you observe on your windows machines?

@szha @gaozhihan I can reproduce this bug using a Windows machine. The numpy version is also 1.14.6.

The possible workaround is to force the elements to be in64, i.e., np.prod([np.int64(ele) for ele in self.shape])

In [1]: import numpy as np

In [2]: np.prod((300000, 100000))
Out[2]: -64771072

In [3]: np.prod((np.int64(300000), np.int64(100000)))
Out[3]: 30000000000

In [4]: np.__version__
Out[4]: '1.14.6'

ndarray.shape is a tuple, whose elements are long-type numbers. In Windows, long type is int32 rather than int64.
Reference: https://github.com/numpy/numpy/issues/12264

The weird thing is that prod can still give the right answer in Mac/Linux when inputs are int32.

In [2]: np.prod((np.int32(300000), np.int32(100000)))
Out[2]: 30000000000

on my windows machine with python 3.6.5, numpy 1.14.5, it's overflow but still gives upper bound positive number

>>> import numpy as np
>>> np.prod((3000000, 1000000))
2112827392
>>> np.__version__
'1.14.5'
>>> np.prod((np.int64(300000), 1000000))
300000000000

@mxnet-label-bot add [Gluon, Bug]

Was this page helpful?
0 / 5 - 0 ratings