Incubator-mxnet: Passing parameters to HybridBlocks and not using them

Created on 9 Mar 2019 · 18Comments · Source: apache/incubator-mxnet

Description

Not using variables in hybrid_forward() causes deferred initialization to fail. There is no requirement that one should use ALL passed input. I am not sure why it failed to infer the input shape for the dense layer.
It works fine without hybridize of course.
The reason we are passing input data to blocks without using them is because some subclasses uses them and we would like to unify the interface so calling blocks do not have to be aware of what type of blocks they are calling. We cannot use __call__() or forward() since these blocks will be hybridized and served from C++.

Environment info (Required)

----------Python Info----------
Version      : 3.6.7
Compiler     : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
Build        : ('default', 'Oct 23 2018 14:01:38')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 18.0
Directory    : /Users/<me>/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.3.1
Directory    : /Users/<me>/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Commit Hash   : 19c501680183237d52a862e6ae1dc4ddc296305b
----------System Info----------
Platform     : Darwin-16.7.0-x86_64-i386-64bit
system       : Darwin
node         : 88e9fe531e66.ant.amazon.com
release      : 16.7.0
version      : Darwin Kernel Version 16.7.0: Thu Dec 20 21:53:35 PST 2018; root:xnu-3789.73.31~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0196 sec, LOAD: 0.5490 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0161 sec, LOAD: 0.6451 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0291 sec, LOAD: 0.5838 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0175 sec, LOAD: 0.7988 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0163 sec, LOAD: 0.3659 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0172 sec, LOAD: 0.1020 sec.

Error Message:

---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
    804             cargs = [args[i] if is_arg else i.data()
--> 805                      for is_arg, i in self._cached_op_args]
    806         except DeferredInitializationError:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
    804             cargs = [args[i] if is_arg else i.data()
--> 805                      for is_arg, i in self._cached_op_args]
    806         except DeferredInitializationError:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    493                                "instead." % (self.name, str(ctx), self._stype))
--> 494         return self._check_and_get(self._data, ctx)
    495 

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    207                 "You can also avoid deferred initialization by specifying in_units, " \
--> 208                 "num_features, etc., for network layers."%(self.name))
    209         raise RuntimeError(

DeferredInitializationError: Parameter 'dense4_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

MXNetError                                Traceback (most recent call last)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    790         try:
--> 791             self.infer_shape(*args)
    792         except Exception as e:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
    863         """Infers shape of Parameters from inputs."""
--> 864         self._infer_attrs('infer_shape', 'shape', *args)
    865 

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
    852             arg_attrs, _, aux_attrs = getattr(out, infer_fn)(
--> 853                 **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
    854             if arg_attrs is None:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py in infer_shape(self, *args, **kwargs)
    995         try:
--> 996             res = self._infer_shape_impl(False, *args, **kwargs)
    997             if res[1] is None:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py in _infer_shape_impl(self, partial, *args, **kwargs)
   1125             ctypes.byref(aux_shape_data),
-> 1126             ctypes.byref(complete)))
   1127         if complete.value != 0:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
    250     if ret != 0:
--> 251         raise MXNetError(py_str(_LIB.MXGetLastError()))
    252 

MXNetError: [18:29:57] src/c_api/c_api_symbolic.cc:493: InferShapeKeyword argument name data3 not found.
Candidate arguments:
    [0]data0
    [1]embedding8_weight
    [2]data2
    [3]embedding9_weight
    [4]dense4_weight
    [5]dense4_bias


Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000010cac4740 libmxnet.so + 26432
[bt] (1) 1   libmxnet.so                         0x000000010cac44ef libmxnet.so + 25839
[bt] (2) 2   libmxnet.so                         0x000000010dfcedbe MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x000000010dfcd0e2 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010b7ca884 ffi_call_unix64 + 76



During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-10-680dd178ea34> in <module>()
      4 vl2 = mx.nd.array([3,2])
      5 
----> 6 net(x1, vl1, x2, vl2)

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    540             hook(self, args)
    541 
--> 542         out = self.forward(*args)
    543 
    544         for hook in self._forward_hooks.values():

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
    907             with x.context as ctx:
    908                 if self._active:
--> 909                     return self._call_cached_op(x, *args)
    910 
    911                 try:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
    805                      for is_arg, i in self._cached_op_args]
    806         except DeferredInitializationError:
--> 807             self._deferred_infer_shape(*args)
    808             cargs = []
    809             for is_arg, i in self._cached_op_args:

~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    793             error_msg = "Deferred initialization failed because shape"\
    794                         " cannot be inferred. {}".format(e)
--> 795             raise ValueError(error_msg)
    796 
    797     def _call_cached_op(self, *args):

ValueError: Deferred initialization failed because shape cannot be inferred. [18:29:57] src/c_api/c_api_symbolic.cc:493: InferShapeKeyword argument name data3 not found.
Candidate arguments:
    [0]data0
    [1]embedding8_weight
    [2]data2
    [3]embedding9_weight
    [4]dense4_weight
    [5]dense4_bias


Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000010cac4740 libmxnet.so + 26432
[bt] (1) 1   libmxnet.so                         0x000000010cac44ef libmxnet.so + 25839
[bt] (2) 2   libmxnet.so                         0x000000010dfcedbe MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x000000010dfcd0e2 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010b7ca884 ffi_call_unix64 + 76

Minimum reproducible example

import mxnet.gluon as gl
import mxnet as mx


class EmbeddingBlock(gl.HybridBlock):
    def __init__(self, num_toks, dim, **kwargs):
        super(EmbeddingBlock, self).__init__(**kwargs)
        self.emb = gl.nn.Embedding(num_toks, dim)

    def hybrid_forward(self, F, x, valid_length):
        # NOTE valid_length is not used
        return self.emb(x)


class Net(gl.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = gl.nn.Dense(3, flatten=False)
        self.e1 = EmbeddingBlock(10,100)
        self.e2 = EmbeddingBlock(20,60)

    def hybrid_forward(self, F, x1, vl1, x2, vl2):
        o = F.concat(self.e1(x1,vl1), self.e2(x2,vl2), dim=-1)
        return self.dense(o)


net = Net()
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])

net(x1, vl1, x2, vl2)

Steps to reproduce

(Paste the commands you ran that produced the error.)

just put the above in a script and run it

What have you tried to solve it?

The only solutions that works is to use the unused variables in the graph in a redundant way.

Bug Gluon Pending Requester Info

Source

whamza15

👍3

Most helpful comment

There is a similar problem when there are unused parameters.
For example, you can have a model like this:

class Test(mx.gluon.nn.HybridBlock): 
    def __init__(self, mode, *args, **kwargs): 
        super().__init__(*args, **kwargs) 
        self.mode = mode 
        with self.name_scope(): 
            self.d1 = mx.gluon.nn.Dense(2) 
            self.d2 = mx.gluon.nn.Dense(3) 

    def hybrid_forward(self, F, x, *args, **kwargs): 
        o1 = self.d1(x) 
        o2 = self.d2(x) 
        if self.mode: 
            return o1 # output path o2 is not used
        else: 
            return o1, o2

Currently, this model will not hybridize successfully, when mode == True, because the weights in the o2 path are "unused".

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py:694: UserWarning: Parameter test4_dense1_weight, test4_dense1_bias is not used by any computation. Is this intended?
  out = self.forward(*args)
---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
   1012         try:
-> 1013             cargs = [args_without_none[i] if is_arg else i.data()
   1014                      for is_arg, i in self._cached_op_args]

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
   1012         try:
-> 1013             cargs = [args_without_none[i] if is_arg else i.data()
   1014                      for is_arg, i in self._cached_op_args]

/usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    564                                "instead." % (self.name, str(ctx), self._stype))
--> 565         return self._check_and_get(self._data, ctx)
    566 

/usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    230         if self._deferred_init:
--> 231             raise DeferredInitializationError(
    232                 "Parameter '%s' has not been initialized yet because initialization was " \

DeferredInitializationError: Parameter 'test4_dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    973         try:
--> 974             self.infer_shape(*args)
    975         except Exception as e:

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
   1074         """Infers shape of Parameters from inputs."""
-> 1075         self._infer_attrs('infer_shape', 'shape', *args)
   1076 

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
   1070         for i in self.collect_params().values():
-> 1071             setattr(i, attr, sdict[i.name])
   1072 

KeyError: 'test4_dense1_weight'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-48-a18f0aa96b25> in <module>
----> 1 t(mx.nd.array([10]))

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    692             hook(self, args)
    693 
--> 694         out = self.forward(*args)
    695 
    696         for hook in self._forward_hooks.values():

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
   1150                                      'Find all contexts = {}'.format(ctx_set))
   1151                 with ctx:
-> 1152                     return self._call_cached_op(x, *args)
   1153             with ctx:
   1154                 try:

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
   1014                      for is_arg, i in self._cached_op_args]
   1015         except DeferredInitializationError:
-> 1016             self._deferred_infer_shape(*args)
   1017             cargs = []
   1018             for is_arg, i in self._cached_op_args:

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    976             error_msg = "Deferred initialization failed because shape"\
    977                         " cannot be inferred. {}".format(e)
--> 978             raise ValueError(error_msg)
    979 
    980     def _call_cached_op(self, *args):

ValueError: Deferred initialization failed because shape cannot be inferred. 'test4_dense1_weight'

Having unused parameters is useful since you might want your pretrain/finetune/evaluation networks to behave differently, but be compatible for .save_parameters and .load_parameters without allow_missing and ignore_extra.

I think this issue could be fixed without changing the inner workings too much by adding a F.nodiscard(o2) operator. It would be a no-op in nd mode and would somehow mark the output as a required computation during sym mode. Not sure, how feasible something like that is.

My current workaround is something like

        return F.broadcast_add(o1, F.sum(0.0 * o2)) # output path o2 is not used

which is both really ugly and potentially inefficient, since it forces the unneeded computation.

If the F.nodiscard option is too hard to implement, something like

o1 = F.depends_on(o1, o2)

could also work. It would basically be the same as F.broadcast_add(o1, F.sum(0.0 * o2)) but without any computations.

RuRo on 30 Jan 2020

👍3

All 18 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature

mxnet-label-bot on 9 Mar 2019

Looks like a possible bug to me.
I'm labelling it so that the MXNet community can help resolve it.

@mxnet-label-bot Add [Bug, Gluon]

piyushghai on 9 Mar 2019

👍1

@whamza15 This is not an issue of not using all variables in hybrid_forward as the following test works

import mxnet.gluon as gl
import mxnet as mx


class EmbeddingBlock(gl.HybridBlock):
    def __init__(self, num_toks, dim, **kwargs):
        super(EmbeddingBlock, self).__init__(**kwargs)
        self.emb = gl.nn.Embedding(num_toks, dim)

    def hybrid_forward(self, F, x, valid_length):
        # NOTE valid_length is not used
        return self.emb(x)

net = EmbeddingBlock(10, 100)
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])

net(x1, vl1)
print(net.collect_params())

EDIT: The above test works because deferred initialization is not used for embedding layers. For layers using deferred initialization like nn.dense the issue exists as can be verified using the following:

class Net(gl.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = gl.nn.Dense(3, flatten=False)

    def hybrid_forward(self, F, x, v1):
        return self.dense(x)

net = Net()
net.initialize()
net.hybridize()
x = mx.nd.array(range(8)).reshape(2,-1)
v1 = mx.nd.array([3,2])
net(x, v1)

Error Message:

/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py:540: UserWarning: The 1-th input to HybridBlock is not used by any computation. Is this intended?
  out = self.forward(*args)
infer_shape error. Arguments:
  data0: (2, 4)
  data1: (2,)
Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in _call_cached_op
    for is_arg, i in self._cached_op_args]
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in <listcomp>
    for is_arg, i in self._cached_op_args]
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 494, in data
    return self._check_and_get(self._data, ctx)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 208, in _check_and_get
    "num_features, etc., for network layers."%(self.name))
mxnet.gluon.parameter.DeferredInitializationError: Parameter 'dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 789, in _deferred_infer_shape
    self.infer_shape(*args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 862, in infer_shape
    self._infer_attrs('infer_shape', 'shape', *args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 851, in _infer_attrs
    **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
  File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 996, in infer_shape
    res = self._infer_shape_impl(False, *args, **kwargs)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1126, in _infer_shape_impl
    ctypes.byref(complete)))
  File "/anaconda3/lib/python3.7/site-packages/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found.
Candidate arguments:
    [0]data0
    [1]dense0_weight
    [2]dense0_bias


Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000011164e390 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736
[bt] (1) 1   libmxnet.so                         0x000000011164e13f std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2143
[bt] (2) 2   libmxnet.so                         0x0000000112c4a85e MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x0000000112c48b82 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010a0b1884 ffi_call_unix64 + 76



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_gl1.py", line 28, in <module>
    net(x, v1)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
    out = self.forward(*args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 907, in forward
    return self._call_cached_op(x, *args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 805, in _call_cached_op
    self._deferred_infer_shape(*args)
  File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 793, in _deferred_infer_shape
    raise ValueError(error_msg)
ValueError: Deferred initialization failed because shape cannot be inferred. [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found.
Candidate arguments:
    [0]data0
    [1]dense0_weight
    [2]dense0_bias


Stack trace returned 5 entries:
[bt] (0) 0   libmxnet.so                         0x000000011164e390 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736
[bt] (1) 1   libmxnet.so                         0x000000011164e13f std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2143
[bt] (2) 2   libmxnet.so                         0x0000000112c4a85e MXSymbolInferShape + 9582
[bt] (3) 3   libmxnet.so                         0x0000000112c48b82 MXSymbolInferShape + 2194
[bt] (4) 4   libffi.6.dylib                      0x000000010a0b1884 ffi_call_unix64 + 76

abhinavs95 on 3 Apr 2019

I am trying to figure out if this is actually a bug and if there is a possible workaround for this usecase.

@sandeep-krishnamurthy @safrooze Could you please have a look?

abhinavs95 on 5 Apr 2019

Possibly related to #13967

abhinavs95 on 3 Jun 2019

It seems like this is expected behavior, @eric-haibin-lin could you have a look and confirm?

@whamza15 Since the error pops up due to deferred initialization, you can avoid it by specifying the input shape when creating the layers. Here is the full example:

import mxnet.gluon as gl
import mxnet as mx


class EmbeddingBlock(gl.HybridBlock):
    def __init__(self, num_toks, dim, **kwargs):
        super(EmbeddingBlock, self).__init__(**kwargs)
        self.emb = gl.nn.Embedding(num_toks, dim)

    def hybrid_forward(self, F, x, valid_length):
        # NOTE valid_length is not used
        return self.emb(x)


class Net(gl.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.dense = gl.nn.Dense(3, in_units=160, flatten=False)
        self.e1 = EmbeddingBlock(10,100)
        self.e2 = EmbeddingBlock(20,60)

    def hybrid_forward(self, F, x1, vl1, x2, vl2):
        o = F.concat(self.e1(x1,vl1), self.e2(x2,vl2), dim=-1)
        return self.dense(o)


net = Net()
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])

net(x1, vl1, x2, vl2)

abhinavs95 on 4 Jun 2019

👎2 👍1

@mxnet-label-bot add [pending requester info]

abhinavs95 on 5 Jun 2019

@whamza15 Did these suggestions help you ?

piyushghai on 12 Jun 2019

@whamza15 Can you please close the issue if it has been resolved for you ?

Please feel free to re-open if closed in error.

piyushghai on 15 Jun 2019

Sorry, I did not get a chance to follow up on this. I can try what you described @abhinavs95. However, not using deferred initialization is going to be a bit of a set back in our toolkit that relies so much on that. Is there a possibility this can be solved and still rely on deferred initialization?

whamza15 on 5 Jul 2019

👍1

I just want to add that if I use valid_length in the EmbeddingBlock, it works fine even with deferred initialization.

whamza15 on 5 Jul 2019

@whamza15 does it work if you pass [] as the value for valid_length?

eric-haibin-lin on 7 Jul 2019

@eric-haibin-lin I am not sure I understand the question. valid_length always has value. It is just that this block does not use it. The reason we have this setup is that our toolkit allow people configure blocks (as complex as they want) without having to changing the input. Some blocks may choose to consume valid_length (like complex encoders) while others may choose not to (like simple embedding block).

whamza15 on 8 Jul 2019

We have a temporary workaround in https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/transformer.py#L420-L501 but this bug should definitely fixed in MXNet

eric-haibin-lin on 30 Aug 2019

There is a similar problem when there are unused parameters.
For example, you can have a model like this:

class Test(mx.gluon.nn.HybridBlock): 
    def __init__(self, mode, *args, **kwargs): 
        super().__init__(*args, **kwargs) 
        self.mode = mode 
        with self.name_scope(): 
            self.d1 = mx.gluon.nn.Dense(2) 
            self.d2 = mx.gluon.nn.Dense(3) 

    def hybrid_forward(self, F, x, *args, **kwargs): 
        o1 = self.d1(x) 
        o2 = self.d2(x) 
        if self.mode: 
            return o1 # output path o2 is not used
        else: 
            return o1, o2

Currently, this model will not hybridize successfully, when mode == True, because the weights in the o2 path are "unused".

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py:694: UserWarning: Parameter test4_dense1_weight, test4_dense1_bias is not used by any computation. Is this intended?
  out = self.forward(*args)
---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
   1012         try:
-> 1013             cargs = [args_without_none[i] if is_arg else i.data()
   1014                      for is_arg, i in self._cached_op_args]

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
   1012         try:
-> 1013             cargs = [args_without_none[i] if is_arg else i.data()
   1014                      for is_arg, i in self._cached_op_args]

/usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    564                                "instead." % (self.name, str(ctx), self._stype))
--> 565         return self._check_and_get(self._data, ctx)
    566 

/usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    230         if self._deferred_init:
--> 231             raise DeferredInitializationError(
    232                 "Parameter '%s' has not been initialized yet because initialization was " \

DeferredInitializationError: Parameter 'test4_dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    973         try:
--> 974             self.infer_shape(*args)
    975         except Exception as e:

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
   1074         """Infers shape of Parameters from inputs."""
-> 1075         self._infer_attrs('infer_shape', 'shape', *args)
   1076 

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
   1070         for i in self.collect_params().values():
-> 1071             setattr(i, attr, sdict[i.name])
   1072 

KeyError: 'test4_dense1_weight'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-48-a18f0aa96b25> in <module>
----> 1 t(mx.nd.array([10]))

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    692             hook(self, args)
    693 
--> 694         out = self.forward(*args)
    695 
    696         for hook in self._forward_hooks.values():

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
   1150                                      'Find all contexts = {}'.format(ctx_set))
   1151                 with ctx:
-> 1152                     return self._call_cached_op(x, *args)
   1153             with ctx:
   1154                 try:

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
   1014                      for is_arg, i in self._cached_op_args]
   1015         except DeferredInitializationError:
-> 1016             self._deferred_infer_shape(*args)
   1017             cargs = []
   1018             for is_arg, i in self._cached_op_args:

/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
    976             error_msg = "Deferred initialization failed because shape"\
    977                         " cannot be inferred. {}".format(e)
--> 978             raise ValueError(error_msg)
    979 
    980     def _call_cached_op(self, *args):

ValueError: Deferred initialization failed because shape cannot be inferred. 'test4_dense1_weight'

My current workaround is something like

        return F.broadcast_add(o1, F.sum(0.0 * o2)) # output path o2 is not used

which is both really ugly and potentially inefficient, since it forces the unneeded computation.

If the F.nodiscard option is too hard to implement, something like

o1 = F.depends_on(o1, o2)

could also work. It would basically be the same as F.broadcast_add(o1, F.sum(0.0 * o2)) but without any computations.

RuRo on 30 Jan 2020

👍3

cc @leezu

szha on 30 Jan 2020

Any progress on this?

whamza15 on 15 Feb 2020

@whamza15 this will be taken into account in the MXNet 2.0 roadmap item 4.3, Gluon block enhancement, that @leezu is driving.

szha on 16 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

fatal: reference is not a tree: 89de7a...

seongkyun · 3Comments

gluon.loss.SoftmaxCrossEntropyLoss() missed ignore_label parameter?

qiliux · 3Comments

Bug in backprop for the transpose of a transpose

dmadeka · 3Comments

fine-tuning and freezing layers

yuconglin · 3Comments

Is there a simple way to make two similar networks share same weights?

xzqjack · 3Comments