I have to train two network together, so here is my implementation. But I can't train them like this because one error will be caused as followed. It will work when either one was commented.
The part causing the error was as follow:
for loss in t_xentropy_losses:
loss.backward()
for loss in s_xentropy_losses:
loss.backward()
t_trainer.step(batch_size)
s_trainer.step(batch_size)
error message
Traceback (most recent call last):
File "dml.py", line 151, in <module>
xentropy_loss_op=xentropy_loss_op, kl_loss_op=kl_loss_op, batch_size=batch_size, epochs=20, ctx=ctx)
File "dml.py", line 73, in train
loss.backward()
File "/home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/ndarray/ndarray.py", line 2002, in backward
ctypes.c_void_p(0)))
File "/home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/base.py", line 146, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [20:05:31] /home/wangshuailong/lib/incubator-mxnet/dmlc-core/include/dmlc/./any.h:286: Check failed: type_ != nullptr The any container is empty requested=N5mxnet10Imperative6AGInfoE
Stack trace returned 10 entries:
[bt] (0) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc10StackTraceEv+0x42) [0x7f7303f13012]
[bt] (1) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZN4dmlc15LogMessageFatalD1Ev+0x18) [0x7f7303f135f8]
[bt] (2) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZNK4dmlc3any10check_typeIN5mxnet10Imperative6AGInfoEEEvv+0x174) [0x7f730614b134]
[bt] (3) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet10Imperative8BackwardERKSt6vectorIPNS_7NDArrayESaIS3_EES7_S7_bbb+0x368f) [0x7f730614230f]
[bt] (4) /home/wangshuailong/lib/mxnet_1.0.1/python/mxnet/../../lib/libmxnet.so(MXAutogradBackwardEx+0x527) [0x7f73064859f7]
[bt] (5) /lib64/libffi.so.6(ffi_call_unix64+0x4c) [0x7f73a1294dac]
[bt] (6) /lib64/libffi.so.6(ffi_call+0x1f5) [0x7f73a12946d5]
[bt] (7) /usr/lib64/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x30b) [0x7f73a14a7c8b]
[bt] (8) /usr/lib64/python2.7/lib-dynload/_ctypes.so(+0xaa85) [0x7f73a14a1a85]
[bt] (9) /lib64/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f73a890e9a3]
I guess something was changed when the first backward? but I'm not sure or good at C++.
Any suggestion?
I solved this by write two separate part like this:
with autograd.record():
loss = XXX
loss.backward()
with autograd.record():
loss = XXX
loss.backward()
close this issue for now. : )
I got the same error and stumbled on this issue. I think another and maybe cleaner way that solved the error for me was to do
autograd.backward(list_of_loss_arrays)
In your case it should be possible to do
autograd.backward([t_xentropy_losses, s_xentropy_losses])
Just commenting to help others who come across this. If anyone knows why
for loss in list_of_loss_arrays:
loss.backward()
doesnt work, let me know. Its weird because i got that code snippet from the official mxnet training script to train image classification networks from scratch.
autograd.backward([t_xentropy_losses, s_xentropy_losses])This worked, thanks. But still don't know what is causing .backward() to fail.
Can confirm,
for loss in losses:
loss.backward()
fails for me with similar error, while
autograd.backward(losses)
works like a charm.
Most helpful comment
I got the same error and stumbled on this issue. I think another and maybe cleaner way that solved the error for me was to do
autograd.backward(list_of_loss_arrays)In your case it should be possible to do
autograd.backward([t_xentropy_losses, s_xentropy_losses])Just commenting to help others who come across this. If anyone knows why
doesnt work, let me know. Its weird because i got that code snippet from the official mxnet training script to train image classification networks from scratch.