Hello, I wrote a network with a list as inputs, it works OK if I hybridize it on CPU or not hybridize and just run it on GPU.
But once I try to hybridize it on GPU, it tell me something like Check failed: it != node2index_.end() && it->first == e.node.get():. I have tried to set MXNET_ENGINE_TYPE to NaiveEngine but it does not give me any useful information.
(Paste the complete error message. Please also include stack trace by setting environment variable DMLC_LOG_STACK_TRACE_DEPTH=10 before running your script.)
libluajit.so
Traceback (most recent call last):
File "/data2/kohill/jye_sanka/mx-detection/models/backbones/hrnet/cls_hrnet_mx_seg_fault.py", line 76, in <module>
y_hat = model([mx.nd.random.randn(1, 32, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 28, 28, ctx=ctx)])
File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py", line 682, in __call__
out = self.forward(*args)
File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1244, in forward
return self._call_cached_op(x, *args)
File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/gluon/block.py", line 1028, in _call_cached_op
out = self._cached_op(*cargs)
File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py", line 154, in __call__
ctypes.byref(out_stypes)))
File "/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
[bt] (9) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOpEx+0x3e) [0x7f067c064b3e]
[bt] (8) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXInvokeCachedOp+0x601) [0x7f067c064571]
[bt] (7) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::Forward(std::shared_ptr<mxnet::CachedOp> const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&)+0x16b) [0x7f067b80d21b]
[bt] (6) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::GetCachedOpState(mxnet::Context const&)+0x179) [0x7f067b809899]
[bt] (5) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::CachedOp::CachedOpState::CachedOpState(mxnet::Context const&, nnvm::Graph const&, nnvm::Graph const&, bool)+0x1c6f) [0x7f067b808e6f]
[bt] (4) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::exec::FusePointwiseBackward(nnvm::Graph&&)+0xca) [0x7f067c0d90ba]
[bt] (3) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(nnvm::Graph::indexed_graph() const+0x30) [0x7f0683705480]
[bt] (2) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(nnvm::IndexedGraph::IndexedGraph(nnvm::Graph const&)+0xaf8) [0x7f0683704918]
[bt] (1) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xf4e4598) [0x7f0683703598]
[bt] (0) /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2723218) [0x7f0676942218]
File "src/core/graph.cc", line 101
MXNetError: Check failed: it != node2index_.end() && it->first == e.node.get():
import os
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="10"
import mxnet as mx
import mxnet.gluon as gluon
class nn(object):
@staticmethod
def Sequential(*args):
bl = gluon.nn.HybridSequential()
for a in args:
bl.add(a)
return bl
@staticmethod
def Upsample(scale_factor, mode):
# return BilinearResize2D(scale_factor=scale_factor)
return mx.gluon.nn.HybridLambda(lambda F, x: F.contrib.BilinearResize2D(x, scale_width=scale_factor,
scale_height=scale_factor, name="fwd"))
class HighResolutionModule(gluon.nn.HybridBlock):
def __init__(self):
super(HighResolutionModule, self).__init__()
self.relu = mx.gluon.nn.Activation("relu")
self.fff = nn.Sequential(
mx.gluon.nn.Conv2D(in_channels=64, channels=32, kernel_size=3, padding=1),
nn.Upsample(scale_factor=2, mode="nearest")
)
self.fff1 = nn.Sequential(
mx.gluon.nn.Conv2D(in_channels=32, channels=64, kernel_size=3, padding=1, strides=2),
mx.gluon.nn.BatchNorm(axis=1, momentum=.9, in_channels=32)
)
def hybrid_forward(self, F, x, *args, **kwargs):
y0 = self.relu(x[0] + self.fff(x[1]))
y1 = self.relu(self.fff1(x[0]) + x[1])
return [y0, y1]
class HighResolutionNet(gluon.nn.HybridBlock):
def __init__(self):
super(HighResolutionNet, self).__init__()
self.stage2 = self._make_stage()
def _make_stage(self):
modules = []
for i in range(2):
modules.append(
HighResolutionModule()
)
return nn.Sequential(*modules)
def hybrid_forward(self, F, x_list):
y_list = self.stage2(x_list)
return y_list
def get_cls_net():
model = HighResolutionNet()
return model
if __name__ == '__main__':
import easydict
ctx = mx.gpu()
args = easydict.EasyDict()
model = get_cls_net()
model.initialize()
model.collect_params().reset_ctx(ctx)
model.hybridize()
y_hat = model([mx.nd.random.randn(1, 32, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 28, 28, ctx=ctx)])
Just run the above script, noting that everything is good if ctx is set to mx.cpu.
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python
# paste outputs here
----------Python Info----------
Version : 3.6.5
Compiler : GCC 7.2.0
Build : ('default', 'Apr 29 2018 16:14:56')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 20.2.2
Directory : /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
None
/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so
/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
libuuid.so.1
libluajit.so
Version : 1.7.0
Directory : /data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet
Commit Hash : 64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
64f737cdd59fe88d2c5b479f25d011c5156b6a8a
Library : ['/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so']
Build features:
No runtime build feature info available
----------System Info----------
Platform : Linux-4.13.0-36-generic-x86_64-with-debian-buster-sid
system : Linux
node : a76c618855c0
release : 4.13.0-36-generic
version : #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
Stepping: 2
CPU MHz: 2494.534
CPU max MHz: 3300.0000
CPU min MHz: 1200.0000
BogoMIPS: 4989.06
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti retpoline intel_ppin spec_ctrl tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0307 sec, LOAD: 3.8286 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 6.2298 sec, LOAD: 1.5923 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833)>, DNS finished in 0.396883487701416 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.7994 sec, LOAD: 10.9164 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0293 sec, LOAD: 2.1483 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.19745945930480957 sec.
It is a bug.
The temporary solution is to pass multiple NDArrays rather than a list of NDArray.
For example:
import os
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="10"
import mxnet as mx
print(mx.__version__)
import mxnet.gluon as gluon
class nn(object):
@staticmethod
def Sequential(*args):
bl = gluon.nn.HybridSequential()
for a in args:
bl.add(a)
return bl
@staticmethod
def Upsample(scale_factor, mode):
# return BilinearResize2D(scale_factor=scale_factor)
return mx.gluon.nn.HybridLambda(lambda F, x: F.contrib.BilinearResize2D(x, scale_width=scale_factor,
scale_height=scale_factor, name="fwd"))
class HighResolutionModule(gluon.nn.HybridBlock):
def __init__(self):
super(HighResolutionModule, self).__init__()
self.relu = mx.gluon.nn.Activation("relu")
self.fff = nn.Sequential(
mx.gluon.nn.Conv2D(in_channels=64, channels=32, kernel_size=3, padding=1),
nn.Upsample(scale_factor=2, mode="nearest")
)
self.fff1 = nn.Sequential(
mx.gluon.nn.Conv2D(in_channels=32, channels=64, kernel_size=3, padding=1, strides=2),
mx.gluon.nn.BatchNorm(axis=1, momentum=.9, in_channels=32)
)
def hybrid_forward(self, F, *x, **kwargs):
y0 = self.relu(x[0] + self.fff(x[1]))
y1 = self.relu(self.fff1(x[0]) + x[1])
return [y0, y1]
class HighResolutionNet(gluon.nn.HybridBlock):
def __init__(self):
super(HighResolutionNet, self).__init__()
self.stage2 = self._make_stage()
def _make_stage(self):
modules = []
for i in range(2):
modules.append(
HighResolutionModule()
)
return nn.Sequential(*modules)
def hybrid_forward(self, F, *x_list):
y_list = self.stage2(*x_list)
return y_list
def get_cls_net():
model = HighResolutionNet()
return model
if __name__ == '__main__':
import easydict
ctx = mx.cpu()
args = easydict.EasyDict()
model = get_cls_net()
model.initialize()
model.reset_ctx(ctx)
model.hybridize()
y_hat = model(mx.nd.random.randn(1, 32, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 28, 28, ctx=ctx))
There is a simpler reproduce case:
import os
os.environ["DMLC_LOG_STACK_TRACE_DEPTH"]="10"
# os.environ["MXNET_USE_FUSION"]="0"
import mxnet as mx
import mxnet.gluon as gluon
class HighResolutionModule(gluon.nn.HybridBlock):
def __init__(self):
super(HighResolutionModule, self).__init__()
self.relu = mx.gluon.nn.Activation("relu")
self.fff = mx.gluon.nn.Conv2D(in_channels=64, channels=64, kernel_size=3, padding=1)
self.fff1 = mx.gluon.nn.Conv2D(in_channels=64, channels=64, kernel_size=3, padding=1, strides=1)
def hybrid_forward(self, F, x0, x1):
x = [x0, x1]
print(x)
y0 = (x[0] + self.fff(x[1])).relu()
y1 = (self.fff1(x[0]) + x[1]).relu()
return y0 + y1
if __name__ == '__main__':
ctx = mx.gpu()
model = HighResolutionModule()
model.initialize()
model.collect_params().reset_ctx(ctx)
model.hybridize()
y_hat = model(mx.nd.random.randn(1, 64, 56, 56, ctx=ctx), mx.nd.random.randn(1, 64, 56, 56, ctx=ctx))
print(y_hat.shape)
The input is a tuple instead of a list, so this bug is not caused by using a list as inputs. And if removing relu in the above codes, the program can exit with a segmentation fault. Furthermore, If set env "MXNET_USE_FUSION" to 0, the program can exit normally. Since as far as I know, MXNET_USE_FUSION would fuse relu into the last layer through an in-place operation.
I think this bug is caused by the fusion process.
@MoisesHer
Hi @ptrendx ,
I reproduced the op-fusion bug in MXNet 1.7 and MXNet 2.0 (20200926).
Could you please help check it?
Thank you!
This looks like a bug in the fusion graph pass - I am already working on overhauling it in #19269 so will investigate and fix there. I did a quick check and can reproduce it with new version of the code too, will dig into it.
@kohillyang Could you test whether PR #19269 solves your issue?
@ptrendx It seems that your PR is based on mxnet-2.0 but my codes are based on mxnet-1.7.0. Does there exist any tutorial about how to migrate my codes to mxnet-2.0?
OK, I think I'm facing the problem that @wkcn have said. The behavior of the HybridSequential is a little strange when its input is a list of Symbol.
@ptrendx I have to say that the issue is solved for the small case, but it still exist in my codes. I'm trying to reproduce it with another simple case.
Just to confirm - you are still seeing the error with node2index after applying the PR? Or it still does not work but with a different error?
@ptrendx I just compiled your sources from https://github.com/ptrendx/mxnet/tree/pr_faster_pointwise_pass. The error disappeared for the above reproduce case on this page. However, the same error still occurs in my codes. The following is a re-produce case, it not the smallest one but it can be run directly. I 'm trying to find a smaller one to reproduce the problem.
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao ([email protected])
# Modified by Ke Sun ([email protected])
# ------------------------------------------------------------------------------
"""
This file and several configs are bought from https://github.com/HRNet/HRNet-Image-Classification at commit
https://github.com/HRNet/HRNet-Image-Classification/commit/8f158719e821836e21e6cba99a3241a12a13bc41.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import logging
import functools
import numpy as np
import yaml
import mxnet as mx
import mxnet.gluon as gluon
BN_MOMENTUM = 0.1
logger = logging.getLogger(__name__)
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return gluon.nn.Conv2D(in_channels=in_planes, channels=out_planes, strides=stride,
kernel_size=3, padding=1, use_bias=False)
class NoneHybridBlock(mx.gluon.nn.HybridBlock):
def hybrid_forward(self, F, x, *args, **kwargs):
raise Exception("unimplemented.")
class HybridSequential(mx.gluon.nn.HybridBlock):
"""Stacks HybridBlocks sequentially.
Example::
net = nn.HybridSequential()
net.add(nn.Dense(10, activation='relu'))
net.add(nn.Dense(20))
net.hybridize()
"""
def __init__(self):
super(HybridSequential, self).__init__()
self._layers = []
def add(self, *blocks):
"""Adds block on top of the stack."""
for block in blocks:
self._layers.append(block)
self.register_child(block)
def hybrid_forward(self, F, x):
for block in self._children.values():
x = block()(x)
return x
def __repr__(self):
s = '{name}(\n{modstr}\n)'
modstr = '\n'.join([' ({key}): {block}'.format(key=key,
block=_indent(block().__repr__(), 2))
for key, block in self._children.items()])
return s.format(name=self.__class__.__name__, modstr=modstr)
def __getitem__(self, key):
layers = list(self._children.values())[key]
if isinstance(layers, list):
net = type(self)()
net.add(*(l() for l in layers))
return net
else:
return layers()
def __len__(self):
return len(self._children)
class nn(object):
@staticmethod
def BatchNorm2d(in_planes, momentum):
return gluon.nn.BatchNorm(in_channels=in_planes, momentum=momentum)
@staticmethod
def ReLU(inplace):
return gluon.nn.Activation(activation="relu")
@staticmethod
def Conv2d(in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1,
bias=True, padding_mode='zeros'):
assert padding_mode == "zeros"
return gluon.nn.Conv2D(channels=out_channels, in_channels=in_channels, kernel_size=kernel_size,
strides=stride, padding=padding, dilation=dilation, groups=groups, use_bias=bias)
@staticmethod
def Sequential(*args):
bl = HybridSequential()
for a in args:
bl.add(a)
return bl
@staticmethod
def ModuleList(args):
bl = gluon.nn.HybridSequential()
for a in args:
bl.add(a)
return bl
@staticmethod
def Upsample(scale_factor, mode):
class _BilinearResize2D(gluon.nn.HybridBlock):
def hybrid_forward(self, F, x, *args, **kwargs):
x = F.contrib.BilinearResize2D(x, mode="size",
scale_height=scale_factor,
scale_width=scale_factor)
return x
return _BilinearResize2D()
@staticmethod
def Linear(in_features, out_features, bias=True):
return gluon.nn.Dense(units=out_features, in_units=in_features, use_bias=bias)
class BasicBlock(mx.gluon.HybridBlock):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.downsample = downsample
self.stride = stride
def hybrid_forward(self, F, x, *args, **kwargs):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out = out + residual
out = self.relu(out)
return out
class Bottleneck(mx.gluon.HybridBlock):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
bias=False)
self.bn3 = nn.BatchNorm2d(planes * self.expansion,
momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def hybrid_forward(self, F, x, *args, **kwargs):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out = out + residual
out = self.relu(out)
return out
class HighResolutionModule(gluon.nn.HybridBlock):
def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
num_channels, fuse_method, multi_scale_output=True):
super(HighResolutionModule, self).__init__()
self._check_branches(
num_branches, blocks, num_blocks, num_inchannels, num_channels)
self.num_inchannels = num_inchannels
self.fuse_method = fuse_method
self.num_branches = num_branches
self.multi_scale_output = multi_scale_output
self.branches = self._make_branches(
num_branches, blocks, num_blocks, num_channels)
self.fuse_layers = self._make_fuse_layers()
self.relu = nn.ReLU(False)
def _check_branches(self, num_branches, blocks, num_blocks,
num_inchannels, num_channels):
if num_branches != len(num_blocks):
error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
num_branches, len(num_blocks))
logger.error(error_msg)
raise ValueError(error_msg)
if num_branches != len(num_channels):
error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
num_branches, len(num_channels))
logger.error(error_msg)
raise ValueError(error_msg)
if num_branches != len(num_inchannels):
error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
num_branches, len(num_inchannels))
logger.error(error_msg)
raise ValueError(error_msg)
def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
stride=1):
downsample = None
if stride != 1 or \
self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.num_inchannels[branch_index],
num_channels[branch_index] * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(num_channels[branch_index] * block.expansion,
momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(self.num_inchannels[branch_index],
num_channels[branch_index], stride, downsample))
self.num_inchannels[branch_index] = \
num_channels[branch_index] * block.expansion
for i in range(1, num_blocks[branch_index]):
layers.append(block(self.num_inchannels[branch_index],
num_channels[branch_index]))
return nn.Sequential(*layers)
def _make_branches(self, num_branches, block, num_blocks, num_channels):
branches = []
for i in range(num_branches):
branches.append(
self._make_one_branch(i, block, num_blocks, num_channels))
return nn.ModuleList(branches)
def _make_fuse_layers(self):
if self.num_branches == 1:
return NoneHybridBlock()
num_branches = self.num_branches
num_inchannels = self.num_inchannels
fuse_layers = []
for i in range(num_branches if self.multi_scale_output else 1):
fuse_layer = []
for j in range(num_branches):
if j > i:
fuse_layer.append(nn.Sequential(
nn.Conv2d(num_inchannels[j],
num_inchannels[i],
1,
1,
0,
bias=False),
nn.BatchNorm2d(num_inchannels[i],
momentum=BN_MOMENTUM),
nn.Upsample(scale_factor=2 ** (j - i), mode='nearest')))
elif j == i:
fuse_layer.append(NoneHybridBlock())
else:
conv3x3s = []
for k in range(i - j):
if k == i - j - 1:
num_outchannels_conv3x3 = num_inchannels[i]
conv3x3s.append(nn.Sequential(
nn.Conv2d(num_inchannels[j],
num_outchannels_conv3x3,
3, 2, 1, bias=False),
nn.BatchNorm2d(num_outchannels_conv3x3,
momentum=BN_MOMENTUM)))
else:
num_outchannels_conv3x3 = num_inchannels[j]
conv3x3s.append(nn.Sequential(
nn.Conv2d(num_inchannels[j],
num_outchannels_conv3x3,
3, 2, 1, bias=False),
nn.BatchNorm2d(num_outchannels_conv3x3,
momentum=BN_MOMENTUM),
nn.ReLU(False)))
fuse_layer.append(nn.Sequential(*conv3x3s))
fuse_layers.append(nn.ModuleList(fuse_layer))
return nn.ModuleList(fuse_layers)
def get_num_inchannels(self):
return self.num_inchannels
def hybrid_forward(self, F, x):
if self.num_branches == 1:
return [self.branches[0](x[0])]
for i in range(self.num_branches):
x[i] = self.branches[i](x[i])
x_fuse = []
for i in range(len(self.fuse_layers)):
y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
for j in range(1, self.num_branches):
if i == j:
y = y + x[j]
else:
y = y + self.fuse_layers[i][j](x[j])
x_fuse.append(self.relu(y))
return x_fuse
blocks_dict = {
'BASIC': BasicBlock,
'BOTTLENECK': Bottleneck
}
class HighResolutionNet(gluon.nn.HybridBlock):
def __init__(self, cfg, **kwargs):
super(HighResolutionNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,
bias=False)
self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,
bias=False)
self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
self.relu = nn.ReLU(inplace=True)
self.stage1_cfg = cfg['MODEL']['EXTRA']['STAGE1']
num_channels = self.stage1_cfg['NUM_CHANNELS'][0]
block = blocks_dict[self.stage1_cfg['BLOCK']]
num_blocks = self.stage1_cfg['NUM_BLOCKS'][0]
self.layer1 = self._make_layer(block, 64, num_channels, num_blocks)
stage1_out_channel = block.expansion * num_channels
self.stage2_cfg = cfg['MODEL']['EXTRA']['STAGE2']
num_channels = self.stage2_cfg['NUM_CHANNELS']
block = blocks_dict[self.stage2_cfg['BLOCK']]
num_channels = [
num_channels[i] * block.expansion for i in range(len(num_channels))]
self.transition1 = self._make_transition_layer(
[stage1_out_channel], num_channels)
self.stage2, pre_stage_channels = self._make_stage(
self.stage2_cfg, num_channels)
self.stage3_cfg = cfg['MODEL']['EXTRA']['STAGE3']
num_channels = self.stage3_cfg['NUM_CHANNELS']
block = blocks_dict[self.stage3_cfg['BLOCK']]
num_channels = [
num_channels[i] * block.expansion for i in range(len(num_channels))]
self.transition2 = self._make_transition_layer(
pre_stage_channels, num_channels)
self.stage3, pre_stage_channels = self._make_stage(
self.stage3_cfg, num_channels)
self.stage4_cfg = cfg['MODEL']['EXTRA']['STAGE4']
num_channels = self.stage4_cfg['NUM_CHANNELS']
block = blocks_dict[self.stage4_cfg['BLOCK']]
num_channels = [
num_channels[i] * block.expansion for i in range(len(num_channels))]
self.transition3 = self._make_transition_layer(
pre_stage_channels, num_channels)
self.stage4, pre_stage_channels = self._make_stage(
self.stage4_cfg, num_channels, multi_scale_output=True)
# Classification Head
self.incre_modules, self.downsamp_modules, \
self.final_layer = self._make_head(pre_stage_channels)
self.classifier = nn.Linear(2048, 1000)
def _make_head(self, pre_stage_channels):
head_block = Bottleneck
head_channels = [32, 64, 128, 256]
# Increasing the #channels on each resolution
# from C, 2C, 4C, 8C to 128, 256, 512, 1024
incre_modules = []
for i, channels in enumerate(pre_stage_channels):
incre_module = self._make_layer(head_block,
channels,
head_channels[i],
1,
stride=1)
incre_modules.append(incre_module)
incre_modules = nn.ModuleList(incre_modules)
# downsampling modules
downsamp_modules = []
for i in range(len(pre_stage_channels) - 1):
in_channels = head_channels[i] * head_block.expansion
out_channels = head_channels[i + 1] * head_block.expansion
downsamp_module = nn.Sequential(
nn.Conv2d(in_channels=in_channels,
out_channels=out_channels,
kernel_size=3,
stride=2,
padding=1),
nn.BatchNorm2d(out_channels, momentum=BN_MOMENTUM),
nn.ReLU(inplace=True)
)
downsamp_modules.append(downsamp_module)
downsamp_modules = nn.ModuleList(downsamp_modules)
final_layer = nn.Sequential(
nn.Conv2d(
in_channels=head_channels[3] * head_block.expansion,
out_channels=2048,
kernel_size=1,
stride=1,
padding=0
),
nn.BatchNorm2d(2048, momentum=BN_MOMENTUM),
nn.ReLU(inplace=True)
)
return incre_modules, downsamp_modules, final_layer
def _make_transition_layer(
self, num_channels_pre_layer, num_channels_cur_layer):
num_branches_cur = len(num_channels_cur_layer)
num_branches_pre = len(num_channels_pre_layer)
transition_layers = []
for i in range(num_branches_cur):
if i < num_branches_pre:
if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
transition_layers.append(nn.Sequential(
nn.Conv2d(num_channels_pre_layer[i],
num_channels_cur_layer[i],
3,
1,
1,
bias=False),
nn.BatchNorm2d(
num_channels_cur_layer[i], momentum=BN_MOMENTUM),
nn.ReLU(inplace=True)))
else:
transition_layers.append(NoneHybridBlock())
else:
conv3x3s = []
for j in range(i + 1 - num_branches_pre):
inchannels = num_channels_pre_layer[-1]
outchannels = num_channels_cur_layer[i] \
if j == i - num_branches_pre else inchannels
conv3x3s.append(nn.Sequential(
nn.Conv2d(
inchannels, outchannels, 3, 2, 1, bias=False),
nn.BatchNorm2d(outchannels, momentum=BN_MOMENTUM),
nn.ReLU(inplace=True)))
transition_layers.append(nn.Sequential(*conv3x3s))
return nn.ModuleList(transition_layers)
def _make_layer(self, block, inplanes, planes, blocks, stride=1):
downsample = None
if stride != 1 or inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
)
layers = []
layers.append(block(inplanes, planes, stride, downsample))
inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(inplanes, planes))
return nn.Sequential(*layers)
def _make_stage(self, layer_config, num_inchannels,
multi_scale_output=True):
num_modules = layer_config['NUM_MODULES']
num_branches = layer_config['NUM_BRANCHES']
num_blocks = layer_config['NUM_BLOCKS']
num_channels = layer_config['NUM_CHANNELS']
block = blocks_dict[layer_config['BLOCK']]
fuse_method = layer_config['FUSE_METHOD']
modules = []
for i in range(num_modules):
# multi_scale_output is only used last module
if not multi_scale_output and i == num_modules - 1:
reset_multi_scale_output = False
else:
reset_multi_scale_output = True
modules.append(
HighResolutionModule(num_branches,
block,
num_blocks,
num_inchannels,
num_channels,
fuse_method,
reset_multi_scale_output)
)
num_inchannels = modules[-1].get_num_inchannels()
return nn.Sequential(*modules), num_inchannels
def hybrid_forward(self, F, x, *args, **kwargs):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.layer1(x)
x_list = []
for i in range(self.stage2_cfg['NUM_BRANCHES']):
if self.transition1[i] is not None and not isinstance(self.transition1[i], NoneHybridBlock):
x_list.append(self.transition1[i](x))
else:
x_list.append(x)
y_list = self.stage2(x_list)
x_list = []
for i in range(self.stage3_cfg['NUM_BRANCHES']):
if self.transition2[i] is not None and not isinstance(self.transition2[i], NoneHybridBlock):
x_list.append(self.transition2[i](y_list[-1]))
else:
x_list.append(y_list[i])
y_list = self.stage3(x_list)
x_list = []
for i in range(self.stage4_cfg['NUM_BRANCHES']):
if self.transition3[i] is not None and not isinstance(self.transition3[i], NoneHybridBlock):
x_list.append(self.transition3[i](y_list[-1]))
else:
x_list.append(y_list[i])
y_list = self.stage4(x_list)
# Classification Head
y = self.incre_modules[0](y_list[0])
for i in range(len(self.downsamp_modules)):
y = self.incre_modules[i + 1](y_list[i + 1]) + \
self.downsamp_modules[i](y)
y = self.final_layer(y)
# if torch._C._get_tracing_state():
# y = y.flatten(start_dim=2).mean(dim=2)
# else:
# y = F.avg_pool2d(y, kernel_size=y.size()
# [2:]).view(y.size(0), -1)
y = y.reshape((0, 0, -1)).mean(axis=2)
y = self.classifier(y)
return y
cfg_yaml ="""
GPUS: (0,1,2,3)
LOG_DIR: 'log/'
DATA_DIR: ''
OUTPUT_DIR: 'output/'
WORKERS: 4
PRINT_FREQ: 1000
MODEL:
NAME: cls_hrnet
IMAGE_SIZE:
- 224
- 224
EXTRA:
WITH_HEAD: true
STAGE1:
NUM_MODULES: 1
NUM_RANCHES: 1
BLOCK: BOTTLENECK
NUM_BLOCKS:
- 2
NUM_CHANNELS:
- 64
FUSE_METHOD: SUM
STAGE2:
NUM_MODULES: 1
NUM_BRANCHES: 2
BLOCK: BASIC
NUM_BLOCKS:
- 2
- 2
NUM_CHANNELS:
- 18
- 36
FUSE_METHOD: SUM
STAGE3:
NUM_MODULES: 3
NUM_BRANCHES: 3
BLOCK: BASIC
NUM_BLOCKS:
- 2
- 2
- 2
NUM_CHANNELS:
- 18
- 36
- 72
FUSE_METHOD: SUM
STAGE4:
NUM_MODULES: 2
NUM_BRANCHES: 4
BLOCK: BASIC
NUM_BLOCKS:
- 2
- 2
- 2
- 2
NUM_CHANNELS:
- 18
- 36
- 72
- 144
FUSE_METHOD: SUM
CUDNN:
BENCHMARK: true
DETERMINISTIC: false
ENABLED: true
DATASET:
DATASET: 'imagenet'
DATA_FORMAT: 'zip'
ROOT: 'data/imagenet/'
TEST_SET: 'val'
TRAIN_SET: 'train'
TEST:
BATCH_SIZE_PER_GPU: 32
MODEL_FILE: ''
TRAIN:
BATCH_SIZE_PER_GPU: 32
BEGIN_EPOCH: 0
END_EPOCH: 100
RESUME: true
LR_FACTOR: 0.1
LR_STEP:
- 30
- 60
- 90
OPTIMIZER: sgd
LR: 0.05
WD: 0.0001
MOMENTUM: 0.9
NESTEROV: true
SHUFFLE: true
DEBUG:
DEBUG: false
"""
def get_cls_net(**kwargs):
config=yaml.load(cfg_yaml)
print(config)
model = HighResolutionNet(config, **kwargs)
# model.init_weights()
return model
if __name__ == '__main__':
ctx = mx.gpu(0)
model = get_cls_net()
model.initialize()
model.hybridize()
model.reset_ctx(ctx)
data = mx.nd.zeros(shape=(1, 3, 512, 512), ctx=ctx)
y_hat = model(data)
And the following is the error message:
/data2/kohill/jye_sanka/anaconda3/bin/python /data2/kohill/jye_sanka/mx-detection/cls_hrnet_mx.py
None
/ssddata/data/data/data/data/mxnet/python/mxnet/../../build/libmxnet.so
/data2/kohill/jye_sanka/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
libuuid.so.1
{'GPUS': '(0,1,2,3)', 'LOG_DIR': 'log/', 'DATA_DIR': '', 'OUTPUT_DIR': 'output/', 'WORKERS': 4, 'PRINT_FREQ': 1000, 'MODEL': {'NAME': 'cls_hrnet', 'IMAGE_SIZE': [224, 224], 'EXTRA': {'WITH_HEAD': True, 'STAGE1': {'NUM_MODULES': 1, 'NUM_RANCHES': 1, 'BLOCK': 'BOTTLENECK', 'NUM_BLOCKS': [2], 'NUM_CHANNELS': [64], 'FUSE_METHOD': 'SUM'}, 'STAGE2': {'NUM_MODULES': 1, 'NUM_BRANCHES': 2, 'BLOCK': 'BASIC', 'NUM_BLOCKS': [2, 2], 'NUM_CHANNELS': [18, 36], 'FUSE_METHOD': 'SUM'}, 'STAGE3': {'NUM_MODULES': 3, 'NUM_BRANCHES': 3, 'BLOCK': 'BASIC', 'NUM_BLOCKS': [2, 2, 2], 'NUM_CHANNELS': [18, 36, 72], 'FUSE_METHOD': 'SUM'}, 'STAGE4': {'NUM_MODULES': 2, 'NUM_BRANCHES': 4, 'BLOCK': 'BASIC', 'NUM_BLOCKS': [2, 2, 2, 2], 'NUM_CHANNELS': [18, 36, 72, 144], 'FUSE_METHOD': 'SUM'}}}, 'CUDNN': {'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}, 'DATASET': {'DATASET': 'imagenet', 'DATA_FORMAT': 'zip', 'ROOT': 'data/imagenet/', 'TEST_SET': 'val', 'TRAIN_SET': 'train'}, 'TEST': {'BATCH_SIZE_PER_GPU': 32, 'MODEL_FILE': ''}, 'TRAIN': {'BATCH_SIZE_PER_GPU': 32, 'BEGIN_EPOCH': 0, 'END_EPOCH': 100, 'RESUME': True, 'LR_FACTOR': 0.1, 'LR_STEP': [30, 60, 90], 'OPTIMIZER': 'sgd', 'LR': 0.05, 'WD': 0.0001, 'MOMENTUM': 0.9, 'NESTEROV': True, 'SHUFFLE': True}, 'DEBUG': {'DEBUG': False}}
[02:15:53] /ssddata/data/data/data/data/mxnet/src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for CPU
[02:15:56] /ssddata/data/data/data/data/mxnet/src/storage/storage.cc:199: Using Pooled (Naive) StorageManager for GPU
Traceback (most recent call last):
File "/data2/kohill/jye_sanka/mx-detection/cls_hrnet_mx.py", line 704, in <module>
y_hat = model(data)
File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 1407, in __call__
return super().__call__(x, *args)
File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 711, in __call__
out = self.forward(*args)
File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 1453, in forward
return self._call_cached_op(x, *args)
File "/ssddata/data/data/data/data/mxnet/python/mxnet/gluon/block.py", line 1131, in _call_cached_op
out = self._cached_op(*cargs)
File "/ssddata/data/data/data/data/mxnet/python/mxnet/_ctypes/ndarray.py", line 179, in __call__
ctypes.byref(out_stypes)))
File "/ssddata/data/data/data/data/mxnet/python/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "/ssddata/data/data/data/data/mxnet/3rdparty/tvm/nnvm/src/core/graph.cc", line 101
MXNetError: Check failed: it != node2index_.end() && it->first == e.node.get():
Process finished with exit code 1
And the same as before, the problem will disappear if set MXNET_USE_FUSION to 0 or to run it on CPU.
Thank you, I will investigate it on Monday.
I can repro it. Not sure yet if it is the same root cause as before and the fix isjust buggy or if this is a different issue.
Hi @kohillyang - it turned out to be the same root cause as the small case - I just had a small bug in my fix. I pushed the fix to the PR, so please retest. Also, by the way - I do intend to cherry pick that PR to 1.x branch of MXNet.
LGTM. Now my codes works without problem. Thanks a million. And I'm very glad to hear that this can be fixed in 1.x branch of MXNet.
Please feel free to close this issue.
Most helpful comment
LGTM. Now my codes works without problem. Thanks a million. And I'm very glad to hear that this can be fixed in 1.x branch of MXNet.
Please feel free to close this issue.