Incubator-mxnet: Custom Function Shape Inference

Created on 21 Feb 2018 · 7Comments · Source: apache/incubator-mxnet

Description

Custom functions seem to be unable to do deferred shape inference, or the methods that enable deferred inference are non-obvious and require examples.

Environment info (Required)

----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 7.2.0
Build        : ('default', 'Jan 16 2018 18:10:19')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.1.0
Directory    : /opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet
Commit Hash   : 07a83a0325a3d782513a04f47d711710972cb144
----------System Info----------
Platform     : Linux-4.13.0-32-generic-x86_64-with-debian-stretch-sid
system       : Linux
node         : 243bb3cedee3
release      : 4.13.0-32-generic
version      : #35~16.04.1-Ubuntu SMP Thu Jan 25 10:13:43 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
Stepping:              1
CPU MHz:               3600.001
CPU max MHz:           4000.0000
CPU min MHz:           1200.0000
BogoMIPS:              7200.00
Virtualization:        VT-x
Hypervisor vendor:     vertical
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-11
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0254 sec, LOAD: 0.5580 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1541 sec, LOAD: 0.0660 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1920 sec, LOAD: 0.1901 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1562 sec, LOAD: 1.5483 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0516 sec, LOAD: 0.1203 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0299 sec, LOAD: 0.0624 sec.

Package used (Python/R/Scala/Julia):
I'm using Python 3.6

Build info (Required if built from source)

Pip Install

Error Message:

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py:512: UserWarning: Cannot decide shape for the following arguments (0s in shape means unknown dimensions). Consider providing them as input:
    conv2_weight: (20, 0, 5, 5)
  **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
---------------------------------------------------------------------------
DeferredInitializationError               Traceback (most recent call last)
/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
    567                         return self._call_cached_op(x, *args)
--> 568                     params = {i: j.data(ctx) for i, j in self._reg_params.items()}
    569                 except DeferredInitializationError:

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in <dictcomp>(.0)
    567                         return self._call_cached_op(x, *args)
--> 568                     params = {i: j.data(ctx) for i, j in self._reg_params.items()}
    569                 except DeferredInitializationError:

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/parameter.py in data(self, ctx)
    388         """
--> 389         return self._check_and_get(self._data, ctx)
    390 

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/parameter.py in _check_and_get(self, arr_list, ctx)
    182                 "You can also avoid deferred initialization by specifying in_units, " \
--> 183                 "num_features, etc., for network layers."%(self.name))
    184         raise RuntimeError(

DeferredInitializationError: Parameter conv2_weight has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers.

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-10-5fdef27b91d5> in <module>()
----> 1 net(test)

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    358     def __call__(self, *args):
    359         """Calls forward. Only accepts positional arguments."""
--> 360         return self.forward(*args)
    361 
    362     def forward(self, *args):

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/nn/basic_layers.py in forward(self, x)
     51     def forward(self, x):
     52         for block in self._children:
---> 53             x = block(x)
     54         return x
     55 

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in __call__(self, *args)
    358     def __call__(self, *args):
    359         """Calls forward. Only accepts positional arguments."""
--> 360         return self.forward(*args)
    361 
    362     def forward(self, *args):

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in forward(self, x, *args)
    568                     params = {i: j.data(ctx) for i, j in self._reg_params.items()}
    569                 except DeferredInitializationError:
--> 570                     self._finish_deferred_init(self._active, x, *args)
    571 
    572                 if self._active:

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in _finish_deferred_init(self, hybrid, *args)
    458 
    459     def _finish_deferred_init(self, hybrid, *args):
--> 460         self.infer_shape(*args)
    461         if hybrid:
    462             for is_arg, i in self._cached_op_args:

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in infer_shape(self, *args)
    519     def infer_shape(self, *args):
    520         """Infers shape of Parameters from inputs."""
--> 521         self._infer_attrs('infer_shape', 'shape', *args)
    522 
    523     def infer_type(self, *args):

/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/mxnet/gluon/block.py in _infer_attrs(self, infer_fn, attr, *args)
    511         arg_attrs, _, aux_attrs = getattr(out, infer_fn)(
    512             **{i.name: getattr(j, attr) for i, j in zip(inputs, args)})
--> 513         sdict = {i: j for i, j in zip(out.list_arguments(), arg_attrs)}
    514         sdict.update({name : attr for name, attr in \
    515                       zip(out.list_auxiliary_states(), aux_attrs)})

TypeError: zip argument #2 must support iteration

Minimum reproducible example

import mxnet as mx
from mxnet import gluon, autograd, nd
from gluon.nn import Conv2D

class Binarize(mx.operator.CustomOp):
    def forward(self, is_train, req, in_data, out_data, aux):
        """Implements forward computation.

        is_train : bool, whether forwarding for training or testing.
        req : list of {'null', 'write', 'inplace', 'add'}, how to assign to out_data. 'null' means skip assignment, etc.
        in_data : list of NDArray, input data.
        out_data : list of NDArray, pre-allocated output buffers.
        aux : list of NDArray, mutable auxiliary states. Usually not used.
        """
        x = in_data[0]
        #y = nd.sign(x)
        y = x
        self.assign(out_data[0], req[0], y)

    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        """Implements backward computation

        req : list of {'null', 'write', 'inplace', 'add'}, how to assign to in_grad
        out_grad : list of NDArray, gradient w.r.t. output data.
        in_grad : list of NDArray, gradient w.r.t. input data. This is the output buffer.
        """
        x = in_data[0]
        dy = out_grad[0]
        dx = dy*(nd.abs(x) <= 1)
        self.assign(in_grad[0], req[0], dx)

@mx.operator.register("binarize")  # register with name "binarize"
class BinarizeProp(mx.operator.CustomOpProp):
    def __init__(self):
        super(BinarizeProp, self).__init__(True)

    def list_arguments(self):
        #  this can be omitted if you only have 1 input.
        return ['data']

    def list_outputs(self):
        #  this can be omitted if you only have 1 output.
        return ['output']

    def infer_shape(self, in_shapes):
        """Calculate output shapes from input shapes. This can be
        omited if all your inputs and outputs have the same shape.

        in_shapes : list of shape. Shape is described by a tuple of int.
        """
        data_shape = in_shapes[0]
        output_shape = data_shape
        # return 3 lists representing inputs shapes, outputs shapes, and aux data shapes.
        return (data_shape,), (output_shape,), ()

    def create_operator(self, ctx, in_shapes, in_dtypes):
        #  create and return the CustomOp class.
        return Binarize()

class BinarizeBlock(gluon.HybridBlock):
    def hybrid_forward(self, F, x):
        return F.Custom(x, name='binarize', op_type='binarize')

class BinaryConv(Conv2D):
    def hybrid_forward(self, F, x, weight, bias=None):
        bin_w = F.Custom(weight, name='binarize', op_type='binarize')
        return super(BinaryConv, self).hybrid_forward(F, x, weight=bin_w, bias=bias)

net = gluon.nn.Sequential()
net.add(BinaryConv(channels=20, kernel_size=5, activation='relu'))
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=mx.cpu())

test = nd.random.normal(shape=(1, 3, 224, 224))
net(test)

Steps to reproduce

Simply run above code

What have you tried to solve it?

Tried to find issues with shape inference in the custom function, but everything seems fine. Using BinarizeBlock in sequential does not cause an issue, it is only when the custom function is used in BinaryConv that the above error appears. I don't see any obvious reasons this issue is popping up, since it looks like the infer_shape function of Binarize is working. I'd love to get this figured out!

Bug Operator

Source

jwfromm

Most helpful comment

Actually. This is not a MXNet bug. Although the error message is not clear.

What happened is Conv2D block relies on the mx.sym.Convolution operator to figure out weight shape from data. Adding a custom op to weight blocks that shape inference path.

You can solve this by specifying the in_channels argument for Conv2D.

We should improve the error message and report "Deferred initialization failed because xx's shape cannot be inferred"
@sxjscience

piiswrong on 28 Feb 2018

👍2

All 7 comments

I think it's not supported. Need to double check with @piiswrong .

sxjscience on 23 Feb 2018

@sandeep-krishnamurthy Please add the labels - Gluon, Python.

We will wait for @sxjscience or @piiswrong to confirm if it is a bug in the current implementation or a new feature that needs to be added.

anirudhacharya on 27 Feb 2018

This is a bug. We should look into it

piiswrong on 28 Feb 2018

Actually. This is not a MXNet bug. Although the error message is not clear.

What happened is Conv2D block relies on the mx.sym.Convolution operator to figure out weight shape from data. Adding a custom op to weight blocks that shape inference path.

You can solve this by specifying the in_channels argument for Conv2D.

We should improve the error message and report "Deferred initialization failed because xx's shape cannot be inferred"
@sxjscience

piiswrong on 28 Feb 2018

👍2

@szha Can this be closed?

haojin2 on 19 Jul 2018

Thanks a lot, I solved the problem by specifying the in_channels argument for Conv2D, but waiting for a more efficient solution.

hustzxd on 13 Dec 2018

The problem is fixed? @jwfromm
I print "in_shapes" in the function "infer_shape". One of channel of the shape always print out zero, e.g. [[64L, 0L, 3L, 3L]], [[4096L, 0L]].
Why?