Incubator-mxnet: two consecutive backward cause error Check failed: type_ != nullptr The any container is empty requested

Created on 6 Feb 2018 · 4Comments · Source: apache/incubator-mxnet

I have to train two network together, so here is my implementation. But I can't train them like this because one error will be caused as followed. It will work when either one was commented.

The part causing the error was as follow:

            for loss in t_xentropy_losses:
                loss.backward()

            for loss in s_xentropy_losses:
                loss.backward()

            t_trainer.step(batch_size)
            s_trainer.step(batch_size)

error message

Traceback (most recent call last):
  File "dml.py", line 151, in <module>
    xentropy_loss_op=xentropy_loss_op, kl_loss_op=kl_loss_op, batch_size=batch_size, epochs=20, ctx=ctx)
  File "dml.py", line 73, in train
    loss.backward()
  File "/home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/ndarray/ndarray.py", line 2002, in backward
    ctypes.c_void_p(0)))
  File "/home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/base.py", line 146, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [20:05:31] /home/wangshuailong/lib/incubator-mxnet/dmlc-core/include/dmlc/./any.h:286: Check failed: type_ != nullptr The any container is empty requested=N5mxnet10Imperative6AGInfoE

Stack trace returned 10 entries:
[bt] (0) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc10StackTraceEv+0x42) [0x7f7303f13012]
[bt] (1) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x18) [0x7f7303f135f8]
[bt] (2) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZNK4dmlc3any10check_typeIN5mxnet10Imperative6AGInfoEEEvv+0x174) [0x7f730614b134]
[bt] (3) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10Imperative8BackwardERKSt6vectorIPNS_7NDArrayESaIS3_EES7_S7_bbb+0x368f) [0x7f730614230f]
[bt] (4) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(MXAutogradBackwardEx+0x527) [0x7f73064859f7]
[bt] (5) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f73a1294dac]
[bt] (6) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f73a12946d5]
[bt] (7) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f73a14a7c8b]
[bt] (8) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f73a14a1a85]
[bt] (9) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f73a890e9a3]

I guess something was changed when the first backward? but I'm not sure or good at C++.

Any suggestion?

Source

wshuail

Most helpful comment

I got the same error and stumbled on this issue. I think another and maybe cleaner way that solved the error for me was to do
autograd.backward(list_of_loss_arrays)
In your case it should be possible to do
autograd.backward([t_xentropy_losses, s_xentropy_losses])

Just commenting to help others who come across this. If anyone knows why

for loss in list_of_loss_arrays:
    loss.backward()

doesnt work, let me know. Its weird because i got that code snippet from the official mxnet training script to train image classification networks from scratch.

adrianloy on 25 Jan 2019

👍6

All 4 comments

I solved this by write two separate part like this:

with autograd.record():
    loss = XXX
loss.backward()

with autograd.record():
    loss = XXX
loss.backward()

close this issue for now. : )

wshuail on 6 Feb 2018

Just commenting to help others who come across this. If anyone knows why

for loss in list_of_loss_arrays:
    loss.backward()

doesnt work, let me know. Its weird because i got that code snippet from the official mxnet training script to train image classification networks from scratch.

adrianloy on 25 Jan 2019

👍6