Incubator-mxnet: customOp Exception: unknown storage type: -1

Created on 3 Oct 2019 · 23Comments · Source: apache/incubator-mxnet

Description

I encounter a the exception "Exception: unknown storage type: -1" when I use my focal loss

my focal loss

the shape of out_data[0] is (batch_size, 2, anchor_num)
the shape of in_data[1] is (batch_size, anchor_num)

class FocalLossOperator(mx.operator.CustomOp):
    def __init__(self, gamma, alpha):
        super(FocalLossOperator, self).__init__()
        self.gamma = gamma
        self.alpha = alpha

    def forward(self, is_train, req, in_data, out_data, aux):
        #print('forward')
        #print(in_data[0].shape)
        y = mx.nd.exp(in_data[0] - mx.nd.max_axis(in_data[0], axis=1).reshape((in_data[0].shape[0], 1, -1)))
        y /= mx.nd.sum(y, axis=1).reshape((in_data[0].shape[0],1, -1))

        self.assign(out_data[0], req[0], y)

    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        y_numpy = out_data[0].asnumpy().transpose((0,2,1))
        label_numpy = in_data[1].asnumpy()
        y_numpy = y_numpy.reshape((-1,2))
        label_numpy = label_numpy.reshape((-1))
        #print(len(np.where(label_numpy == -1)[0]))
        indices = np.where(label_numpy == -1)[0]
        label_numpy[indices] = 0
        self.pro_truth = mx.nd.array(y_numpy[np.arange(y_numpy.shape[0]), label_numpy.astype(np.int)])

        print(len(indices))
        # i!=j
        pro_truth = (self.pro_truth + 1e-14).reshape((self.pro_truth.shape[0], 1))
        grad = self.alpha * mx.nd.power(1 - pro_truth, self.gamma - 1) * \
               (self.gamma * (-1 * pro_truth * mx.nd.array(y_numpy)) * mx.nd.log(pro_truth) + mx.nd.array(y_numpy) * (1 - pro_truth))

        # i==j
        pro_truth = self.pro_truth + 1e-14

        grad_numpy = grad.asnumpy()
        grad_numpy[np.arange(y_numpy.shape[0]), label_numpy.astype(np.int)] = (
                    self.alpha * mx.nd.power(1 - pro_truth, self.gamma) * (
                    self.gamma * pro_truth * mx.nd.log(pro_truth) + pro_truth - 1)).asnumpy()
        grad_numpy /= label_numpy.shape[0]
        grad_numpy[indices,:] = 0
        #grad_numpy = grad_numpy.reshape((out_data[0].shape[0],-1,out_data[0].shape[1])).transpose((0,2,1))
        grad = mx.nd.array(grad_numpy)
        grad = grad.reshape(out_data[0].shape[0],-1,out_data[0].shape[1]).transpose((0,2,1))

        self.assign(in_grad[0], req[0], grad)

@mx.operator.register('FocalLoss')
class FocalLossProp(mx.operator.CustomOpProp):
    def __init__(self, gamma, alpha):
        super(FocalLossProp, self).__init__(need_top_grad=False)

        self.gamma = float(gamma)
        self.alpha = float(alpha)

    def list_arguments(self):
        return ['data', 'labels']

    def list_outputs(self):
        return ['output']

    def infer_shape(self, in_shape):
        data_shape = in_shape[0]
        labels_shape = in_shape[1]
        out_shape = data_shape
        return [data_shape, labels_shape], [out_shape], []

    def create_operator(self, ctx, shapes, dtypes):
        return FocalLossOperator(self.gamma, self.alpha)

Error Message:

Error in CustomOp.backward: Traceback (most recent call last):
File "/home/anaconda2/lib/python2.7/site-packages/mxnet/operator.py", line 1020, in backward_entry
stype=stype))
File "/home/anaconda2/lib/python2.7/site-packages/mxnet/ndarray/sparse.py", line 1187, in _ndarray_cls
raise Exception("unknown storage type: %s"%stype)
Exception: unknown storage type: -1

terminate called after throwing an instance of 'dmlc::Error'
what(): [12:17:03] src/operator/custom/custom.cc:418: Check failed: reinterpret_cast(params.info->callbacks[kCustomOpBackward])( ptrs.size(), const_cast(ptrs.data()), const_cast(tags.data()), reinterpret_cast(req.data()), static_cast(ctx.is_train), params.info->contexts[kCustomOpBackward])

Stack trace returned 8 entries:
[bt] (0) /home/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x40b29a) [0x7feccd0c829a]
[bt] (1) /home/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x40b8b1) [0x7feccd0c88b1]
[bt] (2) /home/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x6c6239) [0x7feccd383239]
[bt] (3) /home/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x6e1020) [0x7feccd39e020]
[bt] (4) /home/anaconda2/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x6c7078) [0x7feccd384078]
[bt] (5) /home/anaconda2/bin/../lib/libstdc++.so.6(+0xafc5c) [0x7fed70a50c5c]
[bt] (6) /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fed78a076ba]
[bt] (7) /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fed7802d41d]

Bug Operator Python

Source

jiashu-zhu

Most helpful comment

Thanks for reporting this @jiashu-zhu @cccorn @chinakook . Thanks a lot @wkcn for offering to help. It is possible that this issue got added with the Sparse Tensor support for custom op. Have you tried commenting out the declare_backward_dependency in CustomOpProp https://github.com/dingjiansw101/RoITransformer_DOTA/blob/master/fpn/operator_py/fpn_psroi_rotatedpooling.py#L128 to see if that fixes the issue. Sorry, I am a little pressed for time right now and won't be able to dig into the issue currently. Can you try this workaround for now ?

anirudh2290 on 22 Oct 2019

👍2

All 23 comments

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Build

mxnet-label-bot on 3 Oct 2019

I have this problem too, It may be a compatible problem.

chinakook on 6 Oct 2019

I think it's serious bug as most python custom operators encounter this error.

chinakook on 6 Oct 2019

I think it's serious bug as most python custom operators encounter this error.

so how can I use this custom operator? I really need to use focal loss in my experiment
I can use other custom operators, they didn't have this problem @chinakook

jiashu-zhu on 6 Oct 2019

You can define storage type as the parent class CustomOp. May be like ['default'].

chinakook on 6 Oct 2019

I could not reproduce this exception

import mxnet as mx
import numpy as np


class FocalLossOperator(mx.operator.CustomOp):
    def __init__(self, gamma, alpha):
        super(FocalLossOperator, self).__init__()
        self.gamma = gamma
        self.alpha = alpha

    def forward(self, is_train, req, in_data, out_data, aux):
        #print('forward')
        #print(in_data[0].shape)
        y = mx.nd.exp(in_data[0] - mx.nd.max_axis(in_data[0], axis=1).reshape((in_data[0].shape[0], 1, -1)))
        y /= mx.nd.sum(y, axis=1).reshape((in_data[0].shape[0],1, -1))

        self.assign(out_data[0], req[0], y)

    def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
        y_numpy = out_data[0].asnumpy().transpose((0,2,1))
        label_numpy = in_data[1].asnumpy()
        y_numpy = y_numpy.reshape((-1,2))
        label_numpy = label_numpy.reshape((-1))
        #print(len(np.where(label_numpy == -1)[0]))
        indices = np.where(label_numpy == -1)[0]
        label_numpy[indices] = 0
        self.pro_truth = mx.nd.array(y_numpy[np.arange(y_numpy.shape[0]), label_numpy.astype(np.int)])

        # print(len(indices))
        # i!=j
        pro_truth = (self.pro_truth + 1e-14).reshape((self.pro_truth.shape[0], 1))
        grad = self.alpha * mx.nd.power(1 - pro_truth, self.gamma - 1) * \
               (self.gamma * (-1 * pro_truth * mx.nd.array(y_numpy)) * mx.nd.log(pro_truth) + mx.nd.array(y_numpy) * (1 - pro_truth))

        # i==j
        pro_truth = self.pro_truth + 1e-14

        grad_numpy = grad.asnumpy()
        grad_numpy[np.arange(y_numpy.shape[0]), label_numpy.astype(np.int)] = (
                    self.alpha * mx.nd.power(1 - pro_truth, self.gamma) * (
                    self.gamma * pro_truth * mx.nd.log(pro_truth) + pro_truth - 1)).asnumpy()
        grad_numpy /= label_numpy.shape[0]
        grad_numpy[indices,:] = 0
        #grad_numpy = grad_numpy.reshape((out_data[0].shape[0],-1,out_data[0].shape[1])).transpose((0,2,1))
        grad = mx.nd.array(grad_numpy)
        grad = grad.reshape(out_data[0].shape[0],-1,out_data[0].shape[1]).transpose((0,2,1))

        self.assign(in_grad[0], req[0], grad)

@mx.operator.register('FocalLoss')
class FocalLossProp(mx.operator.CustomOpProp):
    def __init__(self, gamma, alpha):
        super(FocalLossProp, self).__init__(need_top_grad=False)

        self.gamma = float(gamma)
        self.alpha = float(alpha)

    def list_arguments(self):
        return ['data', 'labels']

    def list_outputs(self):
        return ['output']

    def infer_shape(self, in_shape):
        data_shape = in_shape[0]
        labels_shape = in_shape[1]
        out_shape = data_shape
        return [data_shape, labels_shape], [out_shape], []

    def create_operator(self, ctx, shapes, dtypes):
        return FocalLossOperator(self.gamma, self.alpha)

class FocalLossGluon(mx.gluon.nn.HybridBlock):
    def hybrid_forward(self, F, x, label):
        return F.Custom(x, label, gamma=1, alpha=1, op_type='FocalLoss')

if __name__ == '__main__':
    batch_size = 3
    num_anchor = 4
    x = mx.nd.zeros((batch_size, 2, num_anchor))
    label = mx.nd.zeros((batch_size, num_anchor))
    x.attach_grad()
    with mx.autograd.record():
        y = mx.nd.Custom(x, label, gamma=1, alpha=1, op_type='FocalLoss')
        y.backward()
    print(y)
    print(x.grad)

    block = FocalLossGluon()
    block.hybridize()
    for _ in range(2):
        with mx.autograd.record():
            y = block(x, label)
            y.backward()
        print(y)
        print(x.grad)

wkcn on 6 Oct 2019

@jiashu-zhu paste your model here.

chinakook on 6 Oct 2019

I just use this focalloss to replace softmaxoutput in RetinaFace @chinakook

jiashu-zhu on 6 Oct 2019

https://github.com/deepinsight/insightface/tree/master/RetinaFace @chinakook

jiashu-zhu on 6 Oct 2019

this focalloss works in your code? @wkcn

jiashu-zhu on 6 Oct 2019

@jiashu-zhu Yes, it works in my code.

wkcn on 6 Oct 2019

All FPN op in this repo get this error. It may be something bug with custom op.

chinakook on 7 Oct 2019

Really thanks, so do you have any idea to make it works?I think I can try them @chinakook

jiashu-zhu on 7 Oct 2019

Use a older mxnet version.

chinakook on 7 Oct 2019

Many thanks, I will try it @chinakook

jiashu-zhu on 7 Oct 2019

Could you please tell me which SoftmaxOutput is replaced with FocalLoss?
A minimal reproduce example is good.

wkcn on 7 Oct 2019

I replace the SoftmaxOutput in line 403 of rcnn/symbol/symbol_common, and I use resnet-152 as my backbone, which you can download in retinaface homepage, and other setting remain default @wkcn

jiashu-zhu on 7 Oct 2019

👀1

I got the same problem. I tried mxnet version 1.5.0, 1.4.1, 1.3.1, 1.2.1, 1.1.0, and only version 1.1.0 works for me.