Incubator-mxnet: Same Network can hybridize on CPU but can not hybridize on GPU.

Created on 1 Oct 2020 · 16Comments · Source: apache/incubator-mxnet

Description

Hello, I wrote a network with a list as inputs, it works OK if I hybridize it on CPU or not hybridize and just run it on GPU.
But once I try to hybridize it on GPU, it tell me something like Check failed: it != node2index_.end() && it->first == e.node.get():. I have tried to set MXNET_ENGINE_TYPE to NaiveEngine but it does not give me any useful information.

Error Message

(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)

libluajit.so
Traceback (most recent call last):
  File "/data2/kohill/jye_sanka/mx-detection/models/backbones/hrnet/cls_hrnet_mx_seg_fault.py", line 76, in <module>
    y_hat = model([mx.nd.random.randn(1, 32, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 28, 28, ctx=ctx)])
  File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py", line 682, in __call__
    out = self.forward(*args)
  File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1244, in forward
    return self._call_cached_op(x, *args)
  File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1028, in _call_cached_op
    out = self._cached_op(*cargs)
  File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 154, in __call__
    ctypes.byref(out_stypes)))
  File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  [bt] (9) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOpEx+0x3e) [0x7f067c064b3e]
  [bt] (8) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOp+0x601) [0x7f067c064571]
  [bt] (7) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::Forward(std::shared_ptr<mxnet::CachedOp> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x16b) [0x7f067b80d21b]
  [bt] (6) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::GetCachedOpState(mxnet::Context const&)+0x179) [0x7f067b809899]
  [bt] (5) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::CachedOpState::CachedOpState(mxnet::Context const&, nnvm::Graph const&, nnvm::Graph const&, bool)+0x1c6f) [0x7f067b808e6f]
  [bt] (4) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::exec::FusePointwiseBackward(nnvm::Graph&&)+0xca) [0x7f067c0d90ba]
  [bt] (3) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(nnvm::Graph::indexed_graph() const+0x30) [0x7f0683705480]
  [bt] (2) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(nnvm::IndexedGraph::IndexedGraph(nnvm::Graph const&)+0xaf8) [0x7f0683704918]
  [bt] (1) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xf4e4598) [0x7f0683703598]
  [bt] (0) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2723218) [0x7f0676942218]
  File "src/core/graph.cc", line 101
MXNetError: Check failed: it != node2index_.end() && it->first == e.node.get():

To Reproduce

import os
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="10"
import mxnet as mx
import mxnet.gluon as gluon


class nn(object):
    @staticmethod
    def Sequential(*args):
        bl = gluon.nn.HybridSequential()
        for a in args:
            bl.add(a)
        return bl

    @staticmethod
    def Upsample(scale_factor, mode):
        # return BilinearResize2D(scale_factor=scale_factor)
        return mx.gluon.nn.HybridLambda(lambda F, x: F.contrib.BilinearResize2D(x, scale_width=scale_factor,
                                                                                scale_height=scale_factor, name="fwd"))


class HighResolutionModule(gluon.nn.HybridBlock):
    def __init__(self):
        super(HighResolutionModule, self).__init__()
        self.relu = mx.gluon.nn.Activation("relu")
        self.fff = nn.Sequential(
            mx.gluon.nn.Conv2D(in_channels=64, channels=32, kernel_size=3, padding=1),
            nn.Upsample(scale_factor=2, mode="nearest")
        )
        self.fff1 = nn.Sequential(
            mx.gluon.nn.Conv2D(in_channels=32, channels=64, kernel_size=3, padding=1, strides=2),
            mx.gluon.nn.BatchNorm(axis=1, momentum=.9, in_channels=32)
        )

    def hybrid_forward(self, F, x, *args, **kwargs):
        y0 = self.relu(x[0] + self.fff(x[1]))
        y1 = self.relu(self.fff1(x[0]) + x[1])
        return [y0, y1]


class HighResolutionNet(gluon.nn.HybridBlock):

    def __init__(self):
        super(HighResolutionNet, self).__init__()
        self.stage2 = self._make_stage()

    def _make_stage(self):
        modules = []
        for i in range(2):
            modules.append(
                HighResolutionModule()
            )
        return nn.Sequential(*modules)

    def hybrid_forward(self, F, x_list):
        y_list = self.stage2(x_list)
        return y_list


def get_cls_net():
    model = HighResolutionNet()
    return model


if __name__ == '__main__':
    import easydict

    ctx = mx.gpu()
    args = easydict.EasyDict()
    model = get_cls_net()

    model.initialize()
    model.collect_params().reset_ctx(ctx)

    model.hybridize()
    y_hat = model([mx.nd.random.randn(1, 32, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 28, 28, ctx=ctx)])

Steps to reproduce

Just run the above script, noting that everything is good if ctx is set to mx.cpu.

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python

# paste outputs here
----------Python Info----------
Version      : 3.6.5
Compiler     : GCC 7.2.0
Build        : ('default', 'Apr 29 2018 16:14:56')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 20.2.2
Directory    : /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
None
/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
libuuid.so.1
libluajit.so
Version      : 1.7.0
Directory    : /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet
Commit Hash   : 64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
Library      : ['/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so']
Build features:
No runtime build feature info available
----------System Info----------
Platform     : Linux-4.13.0-36-generic-x86_64-with-debian-buster-sid
system       : Linux
node         : a76c618855c0
release      : 4.13.0-36-generic
version      : #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              48
On-line CPU(s) list: 0-47
Thread(s) per core:  2
Core(s) per socket:  12
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               63
Model name:          Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
Stepping:            2
CPU MHz:             2494.534
CPU max MHz:         3300.0000
CPU min MHz:         1200.0000
BogoMIPS:            4989.06
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            30720K
NUMA node0 CPU(s):   0-11,24-35
NUMA node1 CPU(s):   12-23,36-47
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti retpoline intel_ppin spec_ctrl tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0307 sec, LOAD: 3.8286 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 6.2298 sec, LOAD: 1.5923 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>, DNS finished in 0.396883487701416 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.7994 sec, LOAD: 10.9164 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0293 sec, LOAD: 2.1483 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.19745945930480957 sec.

Bug

Source

kohillyang

Most helpful comment

LGTM. Now my codes works without problem. Thanks a million. And I'm very glad to hear that this can be fixed in 1.x branch of MXNet.

Please feel free to close this issue.

kohillyang on 13 Oct 2020

🎉2

All 16 comments

It is a bug.

The temporary solution is to pass multiple NDArrays rather than a list of NDArray.

For example:

import os
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="10"
import mxnet as mx
print(mx.__version__)
import mxnet.gluon as gluon


class nn(object):
    @staticmethod
    def Sequential(*args):
        bl = gluon.nn.HybridSequential()
        for a in args:
            bl.add(a)
        return bl

    @staticmethod
    def Upsample(scale_factor, mode):
        # return BilinearResize2D(scale_factor=scale_factor)
        return mx.gluon.nn.HybridLambda(lambda F, x: F.contrib.BilinearResize2D(x, scale_width=scale_factor,
                                                                                scale_height=scale_factor, name="fwd"))


class HighResolutionModule(gluon.nn.HybridBlock):
    def __init__(self):
        super(HighResolutionModule, self).__init__()
        self.relu = mx.gluon.nn.Activation("relu")
        self.fff = nn.Sequential(
            mx.gluon.nn.Conv2D(in_channels=64, channels=32, kernel_size=3, padding=1),
            nn.Upsample(scale_factor=2, mode="nearest")
        )
        self.fff1 = nn.Sequential(
            mx.gluon.nn.Conv2D(in_channels=32, channels=64, kernel_size=3, padding=1, strides=2),
            mx.gluon.nn.BatchNorm(axis=1, momentum=.9, in_channels=32)
        )

    def hybrid_forward(self, F, *x, **kwargs):
        y0 = self.relu(x[0] + self.fff(x[1]))
        y1 = self.relu(self.fff1(x[0]) + x[1])
        return [y0, y1]


class HighResolutionNet(gluon.nn.HybridBlock):

    def __init__(self):
        super(HighResolutionNet, self).__init__()
        self.stage2 = self._make_stage()

    def _make_stage(self):
        modules = []
        for i in range(2):
            modules.append(
                HighResolutionModule()
            )
        return nn.Sequential(*modules)

    def hybrid_forward(self, F, *x_list):
        y_list = self.stage2(*x_list)
        return y_list


def get_cls_net():
    model = HighResolutionNet()
    return model


if __name__ == '__main__':
    import easydict

    ctx = mx.cpu()
    args = easydict.EasyDict()
    model = get_cls_net()

    model.initialize()
    model.reset_ctx(ctx)

    model.hybridize()
    y_hat = model(mx.nd.random.randn(1, 32, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 28, 28, ctx=ctx))

wkcn on 2 Oct 2020

👍1

There is a simpler reproduce case:

import os
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="10"
# os.environ["MXNET_USE_FUSION"]="0"
import mxnet as mx
import mxnet.gluon as gluon


class HighResolutionModule(gluon.nn.HybridBlock):
    def __init__(self):
        super(HighResolutionModule, self).__init__()
        self.relu = mx.gluon.nn.Activation("relu")
        self.fff = mx.gluon.nn.Conv2D(in_channels=64, channels=64, kernel_size=3, padding=1)
        self.fff1 = mx.gluon.nn.Conv2D(in_channels=64, channels=64, kernel_size=3, padding=1, strides=1)

    def hybrid_forward(self, F, x0, x1):
        x = [x0, x1]
        print(x)
        y0 = (x[0] + self.fff(x[1])).relu()
        y1 = (self.fff1(x[0]) + x[1]).relu()
        return y0 + y1


if __name__ == '__main__':
    ctx = mx.gpu()
    model = HighResolutionModule()

    model.initialize()
    model.collect_params().reset_ctx(ctx)

    model.hybridize()
    y_hat = model(mx.nd.random.randn(1, 64, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 56, 56, ctx=ctx))
    print(y_hat.shape)

The input is a tuple instead of a list, so this bug is not caused by using a list as inputs. And if removing relu in the above codes, the program can exit with a segmentation fault. Furthermore, If set env "MXNET_USE_FUSION" to 0, the program can exit normally. Since as far as I know, MXNET_USE_FUSION would fuse relu into the last layer through an in-place operation.
I think this bug is caused by the fusion process.

kohillyang on 3 Oct 2020

👍1

@MoisesHer

kohillyang on 3 Oct 2020

Hi @ptrendx ,

I reproduced the op-fusion bug in MXNet 1.7 and MXNet 2.0 (20200926).
Could you please help check it?

Thank you!

wkcn on 3 Oct 2020

👍1

This looks like a bug in the fusion graph pass - I am already working on overhauling it in #19269 so will investigate and fix there. I did a quick check and can reproduce it with new version of the code too, will dig into it.

ptrendx on 6 Oct 2020

👍1

@kohillyang Could you test whether PR #19269 solves your issue?

ptrendx on 10 Oct 2020

@ptrendx It seems that your PR is based on mxnet-2.0 but my codes are based on mxnet-1.7.0. Does there exist any tutorial about how to migrate my codes to mxnet-2.0?

kohillyang on 10 Oct 2020

OK, I think I'm facing the problem that @wkcn have said. The behavior of the HybridSequential is a little strange when its input is a list of Symbol.

kohillyang on 10 Oct 2020

@ptrendx I have to say that the issue is solved for the small case, but it still exist in my codes. I'm trying to reproduce it with another simple case.

kohillyang on 10 Oct 2020

Just to confirm - you are still seeing the error with node2index after applying the PR? Or it still does not work but with a different error?

ptrendx on 10 Oct 2020

@ptrendx I just compiled your sources from https://github.com/ptrendx/mxnet/tree/pr_faster_pointwise_pass. The error disappeared for the above reproduce case on this page. However, the same error still occurs in my codes. The following is a re-produce case, it not the smallest one but it can be run directly. I 'm trying to find a smaller one to reproduce the problem.

# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao ([email protected])
# Modified by Ke Sun ([email protected])
# ------------------------------------------------------------------------------
"""
This file and several configs are bought from https://github.com/HRNet/HRNet-Image-Classification at commit
https://github.com/HRNet/HRNet-Image-Classification/commit/8f158719e821836e21e6cba99a3241a12a13bc41.
"""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import logging
import functools

import numpy as np
import yaml

import mxnet as mx
import mxnet.gluon as gluon

BN_MOMENTUM = 0.1
logger = logging.getLogger(__name__)


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return gluon.nn.Conv2D(in_channels=in_planes, channels=out_planes, strides=stride,
                     kernel_size=3, padding=1, use_bias=False)


class NoneHybridBlock(mx.gluon.nn.HybridBlock):
    def hybrid_forward(self, F, x, *args, **kwargs):
        raise Exception("unimplemented.")


class HybridSequential(mx.gluon.nn.HybridBlock):
    """Stacks HybridBlocks sequentially.

    Example::

        net = nn.HybridSequential()
        net.add(nn.Dense(10, activation='relu'))
        net.add(nn.Dense(20))
        net.hybridize()
    """
    def __init__(self):
        super(HybridSequential, self).__init__()
        self._layers = []

    def add(self, *blocks):
        """Adds block on top of the stack."""
        for block in blocks:
            self._layers.append(block)
            self.register_child(block)

    def hybrid_forward(self, F, x):
        for block in self._children.values():
            x = block()(x)
        return x

    def __repr__(self):
        s = '{name}(\n{modstr}\n)'
        modstr = '\n'.join(['  ({key}): {block}'.format(key=key,
                                                        block=_indent(block().__repr__(), 2))
                            for key, block in self._children.items()])
        return s.format(name=self.__class__.__name__, modstr=modstr)

    def __getitem__(self, key):
        layers = list(self._children.values())[key]
        if isinstance(layers, list):
            net = type(self)()
            net.add(*(l() for l in layers))
            return net
        else:
            return layers()

    def __len__(self):
        return len(self._children)


class nn(object):
    @staticmethod
    def BatchNorm2d(in_planes, momentum):
        return gluon.nn.BatchNorm(in_channels=in_planes, momentum=momentum)

    @staticmethod
    def ReLU(inplace):
        return gluon.nn.Activation(activation="relu")

    @staticmethod
    def Conv2d(in_channels, out_channels, kernel_size, stride=1,
                 padding=0, dilation=1, groups=1,
                 bias=True, padding_mode='zeros'):
        assert padding_mode == "zeros"
        return gluon.nn.Conv2D(channels=out_channels, in_channels=in_channels,  kernel_size=kernel_size,
                               strides=stride, padding=padding, dilation=dilation, groups=groups, use_bias=bias)
    @staticmethod
    def Sequential(*args):
        bl = HybridSequential()
        for a in args:
            bl.add(a)
        return bl

    @staticmethod
    def ModuleList(args):
        bl = gluon.nn.HybridSequential()
        for a in args:
            bl.add(a)
        return bl

    @staticmethod
    def Upsample(scale_factor, mode):
        class _BilinearResize2D(gluon.nn.HybridBlock):
            def hybrid_forward(self, F, x, *args, **kwargs):
                x = F.contrib.BilinearResize2D(x, mode="size",
                                                  scale_height=scale_factor,
                                                  scale_width=scale_factor)
                return x

        return _BilinearResize2D()

    @staticmethod
    def Linear(in_features, out_features, bias=True):
        return gluon.nn.Dense(units=out_features, in_units=in_features, use_bias=bias)


class BasicBlock(mx.gluon.HybridBlock):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.downsample = downsample
        self.stride = stride

    def hybrid_forward(self, F, x, *args, **kwargs):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out = out + residual
        out = self.relu(out)

        return out


class Bottleneck(mx.gluon.HybridBlock):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
                               bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion,
                                  momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def hybrid_forward(self, F, x, *args, **kwargs):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out = out + residual
        out = self.relu(out)

        return out


class HighResolutionModule(gluon.nn.HybridBlock):
    def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
                 num_channels, fuse_method, multi_scale_output=True):
        super(HighResolutionModule, self).__init__()
        self._check_branches(
            num_branches, blocks, num_blocks, num_inchannels, num_channels)

        self.num_inchannels = num_inchannels
        self.fuse_method = fuse_method
        self.num_branches = num_branches

        self.multi_scale_output = multi_scale_output

        self.branches = self._make_branches(
            num_branches, blocks, num_blocks, num_channels)
        self.fuse_layers = self._make_fuse_layers()
        self.relu = nn.ReLU(False)

    def _check_branches(self, num_branches, blocks, num_blocks,
                        num_inchannels, num_channels):
        if num_branches != len(num_blocks):
            error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
                num_branches, len(num_blocks))
            logger.error(error_msg)
            raise ValueError(error_msg)

        if num_branches != len(num_channels):
            error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
                num_branches, len(num_channels))
            logger.error(error_msg)
            raise ValueError(error_msg)

        if num_branches != len(num_inchannels):
            error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
                num_branches, len(num_inchannels))
            logger.error(error_msg)
            raise ValueError(error_msg)

    def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
                         stride=1):
        downsample = None
        if stride != 1 or \
                self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.num_inchannels[branch_index],
                          num_channels[branch_index] * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(num_channels[branch_index] * block.expansion,
                               momentum=BN_MOMENTUM),
            )

        layers = []
        layers.append(block(self.num_inchannels[branch_index],
                            num_channels[branch_index], stride, downsample))
        self.num_inchannels[branch_index] = \
            num_channels[branch_index] * block.expansion
        for i in range(1, num_blocks[branch_index]):
            layers.append(block(self.num_inchannels[branch_index],
                                num_channels[branch_index]))

        return nn.Sequential(*layers)

    def _make_branches(self, num_branches, block, num_blocks, num_channels):
        branches = []

        for i in range(num_branches):
            branches.append(
                self._make_one_branch(i, block, num_blocks, num_channels))

        return nn.ModuleList(branches)

    def _make_fuse_layers(self):
        if self.num_branches == 1:
            return NoneHybridBlock()

        num_branches = self.num_branches
        num_inchannels = self.num_inchannels
        fuse_layers = []
        for i in range(num_branches if self.multi_scale_output else 1):
            fuse_layer = []
            for j in range(num_branches):
                if j > i:
                    fuse_layer.append(nn.Sequential(
                        nn.Conv2d(num_inchannels[j],
                                  num_inchannels[i],
                                  1,
                                  1,
                                  0,
                                  bias=False),
                        nn.BatchNorm2d(num_inchannels[i],
                                       momentum=BN_MOMENTUM),
                        nn.Upsample(scale_factor=2 ** (j - i), mode='nearest')))
                elif j == i:
                    fuse_layer.append(NoneHybridBlock())
                else:
                    conv3x3s = []
                    for k in range(i - j):
                        if k == i - j - 1:
                            num_outchannels_conv3x3 = num_inchannels[i]
                            conv3x3s.append(nn.Sequential(
                                nn.Conv2d(num_inchannels[j],
                                          num_outchannels_conv3x3,
                                          3, 2, 1, bias=False),
                                nn.BatchNorm2d(num_outchannels_conv3x3,
                                               momentum=BN_MOMENTUM)))
                        else:
                            num_outchannels_conv3x3 = num_inchannels[j]
                            conv3x3s.append(nn.Sequential(
                                nn.Conv2d(num_inchannels[j],
                                          num_outchannels_conv3x3,
                                          3, 2, 1, bias=False),
                                nn.BatchNorm2d(num_outchannels_conv3x3,
                                               momentum=BN_MOMENTUM),
                                nn.ReLU(False)))
                    fuse_layer.append(nn.Sequential(*conv3x3s))
            fuse_layers.append(nn.ModuleList(fuse_layer))

        return nn.ModuleList(fuse_layers)

    def get_num_inchannels(self):
        return self.num_inchannels

    def hybrid_forward(self, F, x):

        if self.num_branches == 1:
            return [self.branches[0](x[0])]

        for i in range(self.num_branches):
            x[i] = self.branches[i](x[i])

        x_fuse = []
        for i in range(len(self.fuse_layers)):
            y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
            for j in range(1, self.num_branches):
                if i == j:
                    y = y + x[j]
                else:
                    y = y + self.fuse_layers[i][j](x[j])
            x_fuse.append(self.relu(y))

        return x_fuse


blocks_dict = {
    'BASIC': BasicBlock,
    'BOTTLENECK': Bottleneck
}


class HighResolutionNet(gluon.nn.HybridBlock):

    def __init__(self, cfg, **kwargs):
        super(HighResolutionNet, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,
                               bias=False)
        self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)

        self.stage1_cfg = cfg['MODEL']['EXTRA']['STAGE1']
        num_channels = self.stage1_cfg['NUM_CHANNELS'][0]
        block = blocks_dict[self.stage1_cfg['BLOCK']]
        num_blocks = self.stage1_cfg['NUM_BLOCKS'][0]
        self.layer1 = self._make_layer(block, 64, num_channels, num_blocks)
        stage1_out_channel = block.expansion * num_channels

        self.stage2_cfg = cfg['MODEL']['EXTRA']['STAGE2']
        num_channels = self.stage2_cfg['NUM_CHANNELS']
        block = blocks_dict[self.stage2_cfg['BLOCK']]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))]
        self.transition1 = self._make_transition_layer(
            [stage1_out_channel], num_channels)
        self.stage2, pre_stage_channels = self._make_stage(
            self.stage2_cfg, num_channels)

        self.stage3_cfg = cfg['MODEL']['EXTRA']['STAGE3']
        num_channels = self.stage3_cfg['NUM_CHANNELS']
        block = blocks_dict[self.stage3_cfg['BLOCK']]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))]
        self.transition2 = self._make_transition_layer(
            pre_stage_channels, num_channels)
        self.stage3, pre_stage_channels = self._make_stage(
            self.stage3_cfg, num_channels)

        self.stage4_cfg = cfg['MODEL']['EXTRA']['STAGE4']
        num_channels = self.stage4_cfg['NUM_CHANNELS']
        block = blocks_dict[self.stage4_cfg['BLOCK']]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))]
        self.transition3 = self._make_transition_layer(
            pre_stage_channels, num_channels)
        self.stage4, pre_stage_channels = self._make_stage(
            self.stage4_cfg, num_channels, multi_scale_output=True)

        # Classification Head
        self.incre_modules, self.downsamp_modules, \
        self.final_layer = self._make_head(pre_stage_channels)

        self.classifier = nn.Linear(2048, 1000)

    def _make_head(self, pre_stage_channels):
        head_block = Bottleneck
        head_channels = [32, 64, 128, 256]

        # Increasing the #channels on each resolution
        # from C, 2C, 4C, 8C to 128, 256, 512, 1024
        incre_modules = []
        for i, channels in enumerate(pre_stage_channels):
            incre_module = self._make_layer(head_block,
                                            channels,
                                            head_channels[i],
                                            1,
                                            stride=1)
            incre_modules.append(incre_module)
        incre_modules = nn.ModuleList(incre_modules)

        # downsampling modules
        downsamp_modules = []
        for i in range(len(pre_stage_channels) - 1):
            in_channels = head_channels[i] * head_block.expansion
            out_channels = head_channels[i + 1] * head_block.expansion

            downsamp_module = nn.Sequential(
                nn.Conv2d(in_channels=in_channels,
                          out_channels=out_channels,
                          kernel_size=3,
                          stride=2,
                          padding=1),
                nn.BatchNorm2d(out_channels, momentum=BN_MOMENTUM),
                nn.ReLU(inplace=True)
            )

            downsamp_modules.append(downsamp_module)
        downsamp_modules = nn.ModuleList(downsamp_modules)

        final_layer = nn.Sequential(
            nn.Conv2d(
                in_channels=head_channels[3] * head_block.expansion,
                out_channels=2048,
                kernel_size=1,
                stride=1,
                padding=0
            ),
            nn.BatchNorm2d(2048, momentum=BN_MOMENTUM),
            nn.ReLU(inplace=True)
        )

        return incre_modules, downsamp_modules, final_layer

    def _make_transition_layer(
            self, num_channels_pre_layer, num_channels_cur_layer):
        num_branches_cur = len(num_channels_cur_layer)
        num_branches_pre = len(num_channels_pre_layer)

        transition_layers = []
        for i in range(num_branches_cur):
            if i < num_branches_pre:
                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
                    transition_layers.append(nn.Sequential(
                        nn.Conv2d(num_channels_pre_layer[i],
                                  num_channels_cur_layer[i],
                                  3,
                                  1,
                                  1,
                                  bias=False),
                        nn.BatchNorm2d(
                            num_channels_cur_layer[i], momentum=BN_MOMENTUM),
                        nn.ReLU(inplace=True)))
                else:
                    transition_layers.append(NoneHybridBlock())
            else:
                conv3x3s = []
                for j in range(i + 1 - num_branches_pre):
                    inchannels = num_channels_pre_layer[-1]
                    outchannels = num_channels_cur_layer[i] \
                        if j == i - num_branches_pre else inchannels
                    conv3x3s.append(nn.Sequential(
                        nn.Conv2d(
                            inchannels, outchannels, 3, 2, 1, bias=False),
                        nn.BatchNorm2d(outchannels, momentum=BN_MOMENTUM),
                        nn.ReLU(inplace=True)))
                transition_layers.append(nn.Sequential(*conv3x3s))

        return nn.ModuleList(transition_layers)

    def _make_layer(self, block, inplanes, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
            )

        layers = []
        layers.append(block(inplanes, planes, stride, downsample))
        inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(inplanes, planes))

        return nn.Sequential(*layers)

    def _make_stage(self, layer_config, num_inchannels,
                    multi_scale_output=True):
        num_modules = layer_config['NUM_MODULES']
        num_branches = layer_config['NUM_BRANCHES']
        num_blocks = layer_config['NUM_BLOCKS']
        num_channels = layer_config['NUM_CHANNELS']
        block = blocks_dict[layer_config['BLOCK']]
        fuse_method = layer_config['FUSE_METHOD']

        modules = []
        for i in range(num_modules):
            # multi_scale_output is only used last module
            if not multi_scale_output and i == num_modules - 1:
                reset_multi_scale_output = False
            else:
                reset_multi_scale_output = True

            modules.append(
                HighResolutionModule(num_branches,
                                     block,
                                     num_blocks,
                                     num_inchannels,
                                     num_channels,
                                     fuse_method,
                                     reset_multi_scale_output)
            )
            num_inchannels = modules[-1].get_num_inchannels()

        return nn.Sequential(*modules), num_inchannels

    def hybrid_forward(self, F, x, *args, **kwargs):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.layer1(x)

        x_list = []
        for i in range(self.stage2_cfg['NUM_BRANCHES']):
            if self.transition1[i] is not None and not isinstance(self.transition1[i], NoneHybridBlock):
                x_list.append(self.transition1[i](x))
            else:
                x_list.append(x)
        y_list = self.stage2(x_list)

        x_list = []
        for i in range(self.stage3_cfg['NUM_BRANCHES']):
            if self.transition2[i] is not None and not isinstance(self.transition2[i], NoneHybridBlock):
                x_list.append(self.transition2[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage3(x_list)

        x_list = []
        for i in range(self.stage4_cfg['NUM_BRANCHES']):
            if self.transition3[i] is not None and not isinstance(self.transition3[i], NoneHybridBlock):
                x_list.append(self.transition3[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage4(x_list)

        # Classification Head
        y = self.incre_modules[0](y_list[0])
        for i in range(len(self.downsamp_modules)):
            y = self.incre_modules[i + 1](y_list[i + 1]) + \
                self.downsamp_modules[i](y)

        y = self.final_layer(y)

        # if torch._C._get_tracing_state():
        #     y = y.flatten(start_dim=2).mean(dim=2)
        # else:
        #     y = F.avg_pool2d(y, kernel_size=y.size()
        #     [2:]).view(y.size(0), -1)
        y = y.reshape((0, 0, -1)).mean(axis=2)
        y = self.classifier(y)

        return y
cfg_yaml ="""
GPUS: (0,1,2,3)
LOG_DIR: 'log/'
DATA_DIR: ''
OUTPUT_DIR: 'output/'
WORKERS: 4
PRINT_FREQ: 1000

MODEL: 
  NAME: cls_hrnet
  IMAGE_SIZE: 
    - 224
    - 224
  EXTRA:
    WITH_HEAD: true
    STAGE1:
      NUM_MODULES: 1
      NUM_RANCHES: 1
      BLOCK: BOTTLENECK
      NUM_BLOCKS:
      - 2
      NUM_CHANNELS:
      - 64
      FUSE_METHOD: SUM
    STAGE2:
      NUM_MODULES: 1
      NUM_BRANCHES: 2
      BLOCK: BASIC
      NUM_BLOCKS:
      - 2
      - 2
      NUM_CHANNELS:
      - 18
      - 36
      FUSE_METHOD: SUM
    STAGE3:
      NUM_MODULES: 3
      NUM_BRANCHES: 3
      BLOCK: BASIC
      NUM_BLOCKS:
      - 2
      - 2
      - 2
      NUM_CHANNELS:
      - 18
      - 36
      - 72
      FUSE_METHOD: SUM
    STAGE4:
      NUM_MODULES: 2
      NUM_BRANCHES: 4
      BLOCK: BASIC
      NUM_BLOCKS:
      - 2
      - 2
      - 2
      - 2
      NUM_CHANNELS:
      - 18
      - 36
      - 72
      - 144
      FUSE_METHOD: SUM
CUDNN:
  BENCHMARK: true
  DETERMINISTIC: false
  ENABLED: true
DATASET:
  DATASET: 'imagenet'
  DATA_FORMAT: 'zip'
  ROOT: 'data/imagenet/'
  TEST_SET: 'val'
  TRAIN_SET: 'train'
TEST:
  BATCH_SIZE_PER_GPU: 32
  MODEL_FILE: ''
TRAIN:
  BATCH_SIZE_PER_GPU: 32
  BEGIN_EPOCH: 0
  END_EPOCH: 100
  RESUME: true
  LR_FACTOR: 0.1
  LR_STEP:
  - 30
  - 60
  - 90
  OPTIMIZER: sgd
  LR: 0.05
  WD: 0.0001
  MOMENTUM: 0.9
  NESTEROV: true
  SHUFFLE: true
DEBUG:
  DEBUG: false
"""


def get_cls_net(**kwargs):
    config=yaml.load(cfg_yaml)
    print(config)
    model = HighResolutionNet(config, **kwargs)
    # model.init_weights()
    return model


if __name__ == '__main__':
    ctx = mx.gpu(0)
    model = get_cls_net()
    model.initialize()
    model.hybridize()
    model.reset_ctx(ctx)
    data = mx.nd.zeros(shape=(1, 3, 512, 512), ctx=ctx)
    y_hat = model(data)

kohillyang on 10 Oct 2020

And the following is the error message:

/data2/kohill/jye_sanka/anaconda3/bin/python /data2/kohill/jye_sanka/mx-detection/cls_hrnet_mx.py
None
/ssddata/data/data/data/data/mxnet/python/mxnet/../../build/libmxnet.so
/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
libuuid.so.1
{'GPUS': '(0,1,2,3)', 'LOG_DIR': 'log/', 'DATA_DIR': '', 'OUTPUT_DIR': 'output/', 'WORKERS': 4, 'PRINT_FREQ': 1000, 'MODEL': {'NAME': 'cls_hrnet', 'IMAGE_SIZE': [224, 224], 'EXTRA': {'WITH_HEAD': True, 'STAGE1': {'NUM_MODULES': 1, 'NUM_RANCHES': 1, 'BLOCK': 'BOTTLENECK', 'NUM_BLOCKS': [2], 'NUM_CHANNELS': [64], 'FUSE_METHOD': 'SUM'}, 'STAGE2': {'NUM_MODULES': 1, 'NUM_BRANCHES': 2, 'BLOCK': 'BASIC', 'NUM_BLOCKS': [2, 2], 'NUM_CHANNELS': [18, 36], 'FUSE_METHOD': 'SUM'}, 'STAGE3': {'NUM_MODULES': 3, 'NUM_BRANCHES': 3, 'BLOCK': 'BASIC', 'NUM_BLOCKS': [2, 2, 2], 'NUM_CHANNELS': [18, 36, 72], 'FUSE_METHOD': 'SUM'}, 'STAGE4': {'NUM_MODULES': 2, 'NUM_BRANCHES': 4, 'BLOCK': 'BASIC', 'NUM_BLOCKS': [2, 2, 2, 2], 'NUM_CHANNELS': [18, 36, 72, 144], 'FUSE_METHOD': 'SUM'}}}, 'CUDNN': {'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}, 'DATASET': {'DATASET': 'imagenet', 'DATA_FORMAT': 'zip', 'ROOT': 'data/imagenet/', 'TEST_SET': 'val', 'TRAIN_SET': 'train'}, 'TEST': {'BATCH_SIZE_PER_GPU': 32, 'MODEL_FILE': ''}, 'TRAIN': {'BATCH_SIZE_PER_GPU': 32, 'BEGIN_EPOCH': 0, 'END_EPOCH': 100, 'RESUME': True, 'LR_FACTOR': 0.1, 'LR_STEP': [30, 60, 90], 'OPTIMIZER': 'sgd', 'LR': 0.05, 'WD': 0.0001, 'MOMENTUM': 0.9, 'NESTEROV': True, 'SHUFFLE': True}, 'DEBUG': {'DEBUG': False}}
[02:15:53] /ssddata/data/data/data/data/mxnet/src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU
[02:15:56] /ssddata/data/data/data/data/mxnet/src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for GPU
Traceback (most recent call last):
  File "/data2/kohill/jye_sanka/mx-detection/cls_hrnet_mx.py", line 704, in <module>
    y_hat = model(data)
  File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 1407, in __call__
    return super().__call__(x, *args)
  File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 711, in __call__
    out = self.forward(*args)
  File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 1453, in forward
    return self._call_cached_op(x, *args)
  File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 1131, in _call_cached_op
    out = self._cached_op(*cargs)
  File "/ssddata/data/data/data/data/mxnet/python/mxnet/_ctypes/ndarray.py", line 179, in __call__
    ctypes.byref(out_stypes)))
  File "/ssddata/data/data/data/data/mxnet/python/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "/ssddata/data/data/data/data/mxnet/3rdparty/tvm/nnvm/src/core/graph.cc", line 101
MXNetError: Check failed: it != node2index_.end() && it->first == e.node.get(): 

Process finished with exit code 1

And the same as before, the problem will disappear if set MXNET_USE_FUSION to 0 or to run it on CPU.

kohillyang on 10 Oct 2020

Thank you, I will investigate it on Monday.

ptrendx on 10 Oct 2020

I can repro it. Not sure yet if it is the same root cause as before and the fix isjust buggy or if this is a different issue.

ptrendx on 12 Oct 2020

Hi @kohillyang - it turned out to be the same root cause as the small case - I just had a small bug in my fix. I pushed the fix to the PR, so please retest. Also, by the way - I do intend to cherry pick that PR to 1.x branch of MXNet.

ptrendx on 13 Oct 2020

👍2

LGTM. Now my codes works without problem. Thanks a million. And I'm very glad to hear that this can be fixed in 1.x branch of MXNet.

Please feel free to close this issue.

kohillyang on 13 Oct 2020

🎉2

Was this page helpful?

0 / 5 - 0 ratings