Not using variables in hybrid_forward() causes deferred initialization to fail. There is no requirement that one should use ALL passed input. I am not sure why it failed to infer the input shape for the dense layer.
It works fine without hybridize of course.
The reason we are passing input data to blocks without using them is because some subclasses uses them and we would like to unify the interface so calling blocks do not have to be aware of what type of blocks they are calling. We cannot use __call__() or forward() since these blocks will be hybridized and served from C++.
----------Python Info----------
Version : 3.6.7
Compiler : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
Build : ('default', 'Oct 23 2018 14:01:38')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 18.0
Directory : /Users/<me>/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.3.1
Directory : /Users/<me>/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
Commit Hash : 19c501680183237d52a862e6ae1dc4ddc296305b
----------System Info----------
Platform : Darwin-16.7.0-x86_64-i386-64bit
system : Darwin
node : 88e9fe531e66.ant.amazon.com
release : 16.7.0
version : Darwin Kernel Version 16.7.0: Thu Dec 20 21:53:35 PST 2018; root:xnu-3789.73.31~1/RELEASE_X86_64
----------Hardware Info----------
machine : x86_64
processor : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7660U CPU @ 2.50GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0196 sec, LOAD: 0.5490 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0161 sec, LOAD: 0.6451 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0291 sec, LOAD: 0.5838 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0175 sec, LOAD: 0.7988 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0163 sec, LOAD: 0.3659 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0172 sec, LOAD: 0.1020 sec.
---------------------------------------------------------------------------
DeferredInitializationError Traceback (most recent call last)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
804 cargs = [args[i] if is_arg else i.data()
--> 805 for is_arg, i in self._cached_op_args]
806 except DeferredInitializationError:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
804 cargs = [args[i] if is_arg else i.data()
--> 805 for is_arg, i in self._cached_op_args]
806 except DeferredInitializationError:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
493 "instead." % (self.name, str(ctx), self._stype))
--> 494 return self._check_and_get(self._data, ctx)
495
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
207 "You can also avoid deferred initialization by specifying in_units, " \
--> 208 "num_features, etc., for network layers."%(self.name))
209 raise RuntimeError(
DeferredInitializationError: Parameter 'dense4_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
During handling of the above exception, another exception occurred:
MXNetError Traceback (most recent call last)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
790 try:
--> 791 self.infer_shape(*args)
792 except Exception as e:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
863 """Infers shape of Parameters from inputs."""
--> 864 self._infer_attrs('infer_shape', 'shape', *args)
865
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
852 arg_attrs, _, aux_attrs = getattr(out, infer_fn)(
--> 853 **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
854 if arg_attrs is None:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py in infer_shape(self, *args, **kwargs)
995 try:
--> 996 res = self._infer_shape_impl(False, *args, **kwargs)
997 if res[1] is None:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/symbol/symbol.py in _infer_shape_impl(self, partial, *args, **kwargs)
1125 ctypes.byref(aux_shape_data),
-> 1126 ctypes.byref(complete)))
1127 if complete.value != 0:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
250 if ret != 0:
--> 251 raise MXNetError(py_str(_LIB.MXGetLastError()))
252
MXNetError: [18:29:57] src/c_api/c_api_symbolic.cc:493: InferShapeKeyword argument name data3 not found.
Candidate arguments:
[0]data0
[1]embedding8_weight
[2]data2
[3]embedding9_weight
[4]dense4_weight
[5]dense4_bias
Stack trace returned 5 entries:
[bt] (0) 0 libmxnet.so 0x000000010cac4740 libmxnet.so + 26432
[bt] (1) 1 libmxnet.so 0x000000010cac44ef libmxnet.so + 25839
[bt] (2) 2 libmxnet.so 0x000000010dfcedbe MXSymbolInferShape + 9582
[bt] (3) 3 libmxnet.so 0x000000010dfcd0e2 MXSymbolInferShape + 2194
[bt] (4) 4 libffi.6.dylib 0x000000010b7ca884 ffi_call_unix64 + 76
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-10-680dd178ea34> in <module>()
4 vl2 = mx.nd.array([3,2])
5
----> 6 net(x1, vl1, x2, vl2)
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in __call__(self, *args)
540 hook(self, args)
541
--> 542 out = self.forward(*args)
543
544 for hook in self._forward_hooks.values():
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
907 with x.context as ctx:
908 if self._active:
--> 909 return self._call_cached_op(x, *args)
910
911 try:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
805 for is_arg, i in self._cached_op_args]
806 except DeferredInitializationError:
--> 807 self._deferred_infer_shape(*args)
808 cargs = []
809 for is_arg, i in self._cached_op_args:
~/miniconda2/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
793 error_msg = "Deferred initialization failed because shape"\
794 " cannot be inferred. {}".format(e)
--> 795 raise ValueError(error_msg)
796
797 def _call_cached_op(self, *args):
ValueError: Deferred initialization failed because shape cannot be inferred. [18:29:57] src/c_api/c_api_symbolic.cc:493: InferShapeKeyword argument name data3 not found.
Candidate arguments:
[0]data0
[1]embedding8_weight
[2]data2
[3]embedding9_weight
[4]dense4_weight
[5]dense4_bias
Stack trace returned 5 entries:
[bt] (0) 0 libmxnet.so 0x000000010cac4740 libmxnet.so + 26432
[bt] (1) 1 libmxnet.so 0x000000010cac44ef libmxnet.so + 25839
[bt] (2) 2 libmxnet.so 0x000000010dfcedbe MXSymbolInferShape + 9582
[bt] (3) 3 libmxnet.so 0x000000010dfcd0e2 MXSymbolInferShape + 2194
[bt] (4) 4 libffi.6.dylib 0x000000010b7ca884 ffi_call_unix64 + 76
import mxnet.gluon as gl
import mxnet as mx
class EmbeddingBlock(gl.HybridBlock):
def __init__(self, num_toks, dim, **kwargs):
super(EmbeddingBlock, self).__init__(**kwargs)
self.emb = gl.nn.Embedding(num_toks, dim)
def hybrid_forward(self, F, x, valid_length):
# NOTE valid_length is not used
return self.emb(x)
class Net(gl.HybridBlock):
def __init__(self, **kwargs):
super(Net, self).__init__(**kwargs)
self.dense = gl.nn.Dense(3, flatten=False)
self.e1 = EmbeddingBlock(10,100)
self.e2 = EmbeddingBlock(20,60)
def hybrid_forward(self, F, x1, vl1, x2, vl2):
o = F.concat(self.e1(x1,vl1), self.e2(x2,vl2), dim=-1)
return self.dense(o)
net = Net()
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])
net(x1, vl1, x2, vl2)
(Paste the commands you ran that produced the error.)
The only solutions that works is to use the unused variables in the graph in a redundant way.
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Feature
Looks like a possible bug to me.
I'm labelling it so that the MXNet community can help resolve it.
@mxnet-label-bot Add [Bug, Gluon]
@whamza15 This is not an issue of not using all variables in hybrid_forward as the following test works
import mxnet.gluon as gl
import mxnet as mx
class EmbeddingBlock(gl.HybridBlock):
def __init__(self, num_toks, dim, **kwargs):
super(EmbeddingBlock, self).__init__(**kwargs)
self.emb = gl.nn.Embedding(num_toks, dim)
def hybrid_forward(self, F, x, valid_length):
# NOTE valid_length is not used
return self.emb(x)
net = EmbeddingBlock(10, 100)
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])
net(x1, vl1)
print(net.collect_params())
EDIT: The above test works because deferred initialization is not used for embedding layers. For layers using deferred initialization like nn.dense the issue exists as can be verified using the following:
class Net(gl.HybridBlock):
def __init__(self, **kwargs):
super(Net, self).__init__(**kwargs)
self.dense = gl.nn.Dense(3, flatten=False)
def hybrid_forward(self, F, x, v1):
return self.dense(x)
net = Net()
net.initialize()
net.hybridize()
x = mx.nd.array(range(8)).reshape(2,-1)
v1 = mx.nd.array([3,2])
net(x, v1)
Error Message:
/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py:540: UserWarning: The 1-th input to HybridBlock is not used by any computation. Is this intended?
out = self.forward(*args)
infer_shape error. Arguments:
data0: (2, 4)
data1: (2,)
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in _call_cached_op
for is_arg, i in self._cached_op_args]
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in <listcomp>
for is_arg, i in self._cached_op_args]
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 494, in data
return self._check_and_get(self._data, ctx)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 208, in _check_and_get
"num_features, etc., for network layers."%(self.name))
mxnet.gluon.parameter.DeferredInitializationError: Parameter 'dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 789, in _deferred_infer_shape
self.infer_shape(*args)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 862, in infer_shape
self._infer_attrs('infer_shape', 'shape', *args)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 851, in _infer_attrs
**{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 996, in infer_shape
res = self._infer_shape_impl(False, *args, **kwargs)
File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1126, in _infer_shape_impl
ctypes.byref(complete)))
File "/anaconda3/lib/python3.7/site-packages/mxnet/base.py", line 252, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found.
Candidate arguments:
[0]data0
[1]dense0_weight
[2]dense0_bias
Stack trace returned 5 entries:
[bt] (0) 0 libmxnet.so 0x000000011164e390 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736
[bt] (1) 1 libmxnet.so 0x000000011164e13f std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2143
[bt] (2) 2 libmxnet.so 0x0000000112c4a85e MXSymbolInferShape + 9582
[bt] (3) 3 libmxnet.so 0x0000000112c48b82 MXSymbolInferShape + 2194
[bt] (4) 4 libffi.6.dylib 0x000000010a0b1884 ffi_call_unix64 + 76
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test_gl1.py", line 28, in <module>
net(x, v1)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 540, in __call__
out = self.forward(*args)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 907, in forward
return self._call_cached_op(x, *args)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 805, in _call_cached_op
self._deferred_infer_shape(*args)
File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 793, in _deferred_infer_shape
raise ValueError(error_msg)
ValueError: Deferred initialization failed because shape cannot be inferred. [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found.
Candidate arguments:
[0]data0
[1]dense0_weight
[2]dense0_bias
Stack trace returned 5 entries:
[bt] (0) 0 libmxnet.so 0x000000011164e390 std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736
[bt] (1) 1 libmxnet.so 0x000000011164e13f std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, mxnet::NDArrayFunctionReg*>, void*>*) + 2143
[bt] (2) 2 libmxnet.so 0x0000000112c4a85e MXSymbolInferShape + 9582
[bt] (3) 3 libmxnet.so 0x0000000112c48b82 MXSymbolInferShape + 2194
[bt] (4) 4 libffi.6.dylib 0x000000010a0b1884 ffi_call_unix64 + 76
I am trying to figure out if this is actually a bug and if there is a possible workaround for this usecase.
@sandeep-krishnamurthy @safrooze Could you please have a look?
Possibly related to #13967
It seems like this is expected behavior, @eric-haibin-lin could you have a look and confirm?
@whamza15 Since the error pops up due to deferred initialization, you can avoid it by specifying the input shape when creating the layers. Here is the full example:
import mxnet.gluon as gl
import mxnet as mx
class EmbeddingBlock(gl.HybridBlock):
def __init__(self, num_toks, dim, **kwargs):
super(EmbeddingBlock, self).__init__(**kwargs)
self.emb = gl.nn.Embedding(num_toks, dim)
def hybrid_forward(self, F, x, valid_length):
# NOTE valid_length is not used
return self.emb(x)
class Net(gl.HybridBlock):
def __init__(self, **kwargs):
super(Net, self).__init__(**kwargs)
self.dense = gl.nn.Dense(3, in_units=160, flatten=False)
self.e1 = EmbeddingBlock(10,100)
self.e2 = EmbeddingBlock(20,60)
def hybrid_forward(self, F, x1, vl1, x2, vl2):
o = F.concat(self.e1(x1,vl1), self.e2(x2,vl2), dim=-1)
return self.dense(o)
net = Net()
net.initialize()
net.hybridize()
x1 = mx.nd.array(range(8)).reshape(2,-1)
vl1 = mx.nd.array([3,2])
x2 = mx.nd.array(range(8)).reshape(2,-1)
vl2 = mx.nd.array([3,2])
net(x1, vl1, x2, vl2)
@mxnet-label-bot add [pending requester info]
@whamza15 Did these suggestions help you ?
@whamza15 Can you please close the issue if it has been resolved for you ?
Please feel free to re-open if closed in error.
Sorry, I did not get a chance to follow up on this. I can try what you described @abhinavs95. However, not using deferred initialization is going to be a bit of a set back in our toolkit that relies so much on that. Is there a possibility this can be solved and still rely on deferred initialization?
I just want to add that if I use valid_length in the EmbeddingBlock, it works fine even with deferred initialization.
@whamza15 does it work if you pass [] as the value for valid_length?
@eric-haibin-lin I am not sure I understand the question. valid_length always has value. It is just that this block does not use it. The reason we have this setup is that our toolkit allow people configure blocks (as complex as they want) without having to changing the input. Some blocks may choose to consume valid_length (like complex encoders) while others may choose not to (like simple embedding block).
We have a temporary workaround in https://github.com/dmlc/gluon-nlp/blob/master/src/gluonnlp/model/transformer.py#L420-L501 but this bug should definitely fixed in MXNet
There is a similar problem when there are unused parameters.
For example, you can have a model like this:
class Test(mx.gluon.nn.HybridBlock):
def __init__(self, mode, *args, **kwargs):
super().__init__(*args, **kwargs)
self.mode = mode
with self.name_scope():
self.d1 = mx.gluon.nn.Dense(2)
self.d2 = mx.gluon.nn.Dense(3)
def hybrid_forward(self, F, x, *args, **kwargs):
o1 = self.d1(x)
o2 = self.d2(x)
if self.mode:
return o1 # output path o2 is not used
else:
return o1, o2
Currently, this model will not hybridize successfully, when mode == True, because the weights in the o2 path are "unused".
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py:694: UserWarning: Parameter test4_dense1_weight, test4_dense1_bias is not used by any computation. Is this intended?
out = self.forward(*args)
---------------------------------------------------------------------------
DeferredInitializationError Traceback (most recent call last)
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
1012 try:
-> 1013 cargs = [args_without_none[i] if is_arg else i.data()
1014 for is_arg, i in self._cached_op_args]
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in <listcomp>(.0)
1012 try:
-> 1013 cargs = [args_without_none[i] if is_arg else i.data()
1014 for is_arg, i in self._cached_op_args]
/usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
564 "instead." % (self.name, str(ctx), self._stype))
--> 565 return self._check_and_get(self._data, ctx)
566
/usr/lib/python3.8/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
230 if self._deferred_init:
--> 231 raise DeferredInitializationError(
232 "Parameter '%s' has not been initialized yet because initialization was " \
DeferredInitializationError: Parameter 'test4_dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
973 try:
--> 974 self.infer_shape(*args)
975 except Exception as e:
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
1074 """Infers shape of Parameters from inputs."""
-> 1075 self._infer_attrs('infer_shape', 'shape', *args)
1076
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
1070 for i in self.collect_params().values():
-> 1071 setattr(i, attr, sdict[i.name])
1072
KeyError: 'test4_dense1_weight'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-48-a18f0aa96b25> in <module>
----> 1 t(mx.nd.array([10]))
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in __call__(self, *args)
692 hook(self, args)
693
--> 694 out = self.forward(*args)
695
696 for hook in self._forward_hooks.values():
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
1150 'Find all contexts = {}'.format(ctx_set))
1151 with ctx:
-> 1152 return self._call_cached_op(x, *args)
1153 with ctx:
1154 try:
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _call_cached_op(self, *args)
1014 for is_arg, i in self._cached_op_args]
1015 except DeferredInitializationError:
-> 1016 self._deferred_infer_shape(*args)
1017 cargs = []
1018 for is_arg, i in self._cached_op_args:
/usr/lib/python3.8/site-packages/mxnet/gluon/block.py in _deferred_infer_shape(self, *args)
976 error_msg = "Deferred initialization failed because shape"\
977 " cannot be inferred. {}".format(e)
--> 978 raise ValueError(error_msg)
979
980 def _call_cached_op(self, *args):
ValueError: Deferred initialization failed because shape cannot be inferred. 'test4_dense1_weight'
Having unused parameters is useful since you might want your pretrain/finetune/evaluation networks to behave differently, but be compatible for .save_parameters and .load_parameters without allow_missing and ignore_extra.
I think this issue could be fixed without changing the inner workings too much by adding a F.nodiscard(o2) operator. It would be a no-op in nd mode and would somehow mark the output as a required computation during sym mode. Not sure, how feasible something like that is.
My current workaround is something like
return F.broadcast_add(o1, F.sum(0.0 * o2)) # output path o2 is not used
which is both really ugly and potentially inefficient, since it forces the unneeded computation.
If the F.nodiscard option is too hard to implement, something like
o1 = F.depends_on(o1, o2)
could also work. It would basically be the same as F.broadcast_add(o1, F.sum(0.0 * o2)) but without any computations.
cc @leezu
Any progress on this?
@whamza15 this will be taken into account in the MXNet 2.0 roadmap item 4.3, Gluon block enhancement, that @leezu is driving.
Most helpful comment
There is a similar problem when there are unused parameters.
For example, you can have a model like this:
Currently, this model will not hybridize successfully, when
mode == True, because the weights in theo2path are "unused".Having unused parameters is useful since you might want your pretrain/finetune/evaluation networks to behave differently, but be compatible for
.save_parametersand.load_parameterswithoutallow_missingandignore_extra.I think this issue could be fixed without changing the inner workings too much by adding a
F.nodiscard(o2)operator. It would be a no-op inndmode and would somehow mark the output as a required computation duringsymmode. Not sure, how feasible something like that is.My current workaround is something like
which is both really ugly and potentially inefficient, since it forces the unneeded computation.
If the
F.nodiscardoption is too hard to implement, something likecould also work. It would basically be the same as
F.broadcast_add(o1, F.sum(0.0 * o2))but without any computations.