Incubator-mxnet: Bug in backprop for the transpose of a transpose

Created on 27 Sep 2017  路  3Comments  路  Source: apache/incubator-mxnet

For bugs or installation issues, please provide the following information.
The more information you provide, the more likely people will be able to help you.

Environment info

Operating System: Mac OSX 10.12.4 (Sierra)
Python
MXNet version: 0.11.1
Python version and distribution: conda python 3

Error Message:

Please paste the full error message, including stack trace.

The following function produces an error on backprop:

def ttake( x, i ):
    """ Take from axis 1 instead of 0.
    """
    return mx.sym.flatten( mx.sym.transpose( mx.sym.take( mx.sym.transpose(x) , i ) ) )

[17:53:42] /Users/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [17:53:42] src/operator/tensor/./matrix_op-inl.h:281: Check failed: req[0] == kWriteTo (3 vs. 1) Transpose does not support inplace

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x00000001077c51f8 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x000000010849c4d7 _ZN5mxnet2op9TransposeIN7mshadow3cpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKNSt3__16vectorINS_5TBlobENSB_9allocatorISD_EEEERKNSC_INS_9OpReqTypeENSE_ISJ_EEEESI_ + 567
[bt] (2) 2   libmxnet.so                         0x00000001085c5db7 _ZN5mxnet4exec16FComputeExecutor3RunENS_10RunContextEb + 103
[bt] (3) 3   libmxnet.so                         0x00000001085e6025 _ZNSt3__110__function6__funcIZN5mxnet4exec13GraphExecutor18CreateCachedSegOprEmmE3$_6NS_9allocatorIS5_EEFvNS2_10RunContextENS2_6engine18CallbackOnCompleteEEEclEOS8_OSA_ + 117
[bt] (4) 4   libmxnet.so                         0x00000001085bc032 _ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE + 386
[bt] (5) 5   libmxnet.so                         0x00000001085befc1 _ZNSt3__110__function6__funcIZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS3_8OprBlockEbENKUlvE_clEvEUlvE_NS_9allocatorIS8_EEFvvEEclEv + 65
[bt] (6) 6   libmxnet.so                         0x00000001085bc728 _ZNSt3__114__thread_proxyINS_5tupleIJNS_8functionIFvvEEEEEEEEPvS6_ + 104
[bt] (7) 7   libsystem_pthread.dylib             0x00007fff8ccfa9af _pthread_body + 180
[bt] (8) 8   libsystem_pthread.dylib             0x00007fff8ccfa8fb _pthread_body + 0
[bt] (9) 9   libsystem_pthread.dylib             0x00007fff8ccfa101 thread_start + 13

[17:53:42] /Users/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [17:53:42] src/engine/./threaded_engine.h:347: [17:53:42] src/operator/tensor/./matrix_op-inl.h:281: Check failed: req[0] == kWriteTo (3 vs. 1) Transpose does not support inplace

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x00000001077c51f8 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x000000010849c4d7 _ZN5mxnet2op9TransposeIN7mshadow3cpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKNSt3__16vectorINS_5TBlobENSB_9allocatorISD_EEEERKNSC_INS_9OpReqTypeENSE_ISJ_EEEESI_ + 567
[bt] (2) 2   libmxnet.so                         0x00000001085c5db7 _ZN5mxnet4exec16FComputeExecutor3RunENS_10RunContextEb + 103
[bt] (3) 3   libmxnet.so                         0x00000001085e6025 _ZNSt3__110__function6__funcIZN5mxnet4exec13GraphExecutor18CreateCachedSegOprEmmE3$_6NS_9allocatorIS5_EEFvNS2_10RunContextENS2_6engine18CallbackOnCompleteEEEclEOS8_OSA_ + 117
[bt] (4) 4   libmxnet.so                         0x00000001085bc032 _ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE + 386
[bt] (5) 5   libmxnet.so                         0x00000001085befc1 _ZNSt3__110__function6__funcIZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS3_8OprBlockEbENKUlvE_clEvEUlvE_NS_9allocatorIS8_EEFvvEEclEv + 65
[bt] (6) 6   libmxnet.so                         0x00000001085bc728 _ZNSt3__114__thread_proxyINS_5tupleIJNS_8functionIFvvEEEEEEEEPvS6_ + 104
[bt] (7) 7   libsystem_pthread.dylib             0x00007fff8ccfa9af _pthread_body + 180
[bt] (8) 8   libsystem_pthread.dylib             0x00007fff8ccfa8fb _pthread_body + 0
[bt] (9) 9   libsystem_pthread.dylib             0x00007fff8ccfa101 thread_start + 13

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) 0   libmxnet.so                         0x00000001077c51f8 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x00000001085bc30b _ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE + 1115
[bt] (2) 2   libmxnet.so                         0x00000001085befc1 _ZNSt3__110__function6__funcIZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS3_8OprBlockEbENKUlvE_clEvEUlvE_NS_9allocatorIS8_EEFvvEEclEv + 65
[bt] (3) 3   libmxnet.so                         0x00000001085bc728 _ZNSt3__114__thread_proxyINS_5tupleIJNS_8functionIFvvEEEEEEEEPvS6_ + 104
[bt] (4) 4   libsystem_pthread.dylib             0x00007fff8ccfa9af _pthread_body + 180
[bt] (5) 5   libsystem_pthread.dylib             0x00007fff8ccfa8fb _pthread_body + 0
[bt] (6) 6   libsystem_pthread.dylib             0x00007fff8ccfa101 thread_start + 13

libc++abi.dylib: terminating with uncaught exception of type dmlc::Error: [17:53:42] src/engine/./threaded_engine.h:347: [17:53:42] src/operator/tensor/./matrix_op-inl.h:281: Check failed: req[0] == kWriteTo (3 vs. 1) Transpose does not support inplace

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x00000001077c51f8 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x000000010849c4d7 _ZN5mxnet2op9TransposeIN7mshadow3cpuEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKNSt3__16vectorINS_5TBlobENSB_9allocatorISD_EEEERKNSC_INS_9OpReqTypeENSE_ISJ_EEEESI_ + 567
[bt] (2) 2   libmxnet.so                         0x00000001085c5db7 _ZN5mxnet4exec16FComputeExecutor3RunENS_10RunContextEb + 103
[bt] (3) 3   libmxnet.so                         0x00000001085e6025 _ZNSt3__110__function6__funcIZN5mxnet4exec13GraphExecutor18CreateCachedSegOprEmmE3$_6NS_9allocatorIS5_EEFvNS2_10RunContextENS2_6engine18CallbackOnCompleteEEEclEOS8_OSA_ + 117
[bt] (4) 4   libmxnet.so                         0x00000001085bc032 _ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE + 386
[bt] (5) 5   libmxnet.so                         0x00000001085befc1 _ZNSt3__110__function6__funcIZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS3_8OprBlockEbENKUlvE_clEvEUlvE_NS_9allocatorIS8_EEFvvEEclEv + 65
[bt] (6) 6   libmxnet.so                         0x00000001085bc728 _ZNSt3__114__thread_proxyINS_5tupleIJNS_8functionIFvvEEEEEEEEPvS6_ + 104
[bt] (7) 7   libsystem_pthread.dylib             0x00007fff8ccfa9af _pthread_body + 180
[bt] (8) 8   libsystem_pthread.dylib             0x00007fff8ccfa8fb _pthread_body + 0
[bt] (9) 9   libsystem_pthread.dylib             0x00007fff8ccfa101 thread_start + 13

An fatal error occurred in asynchronous engine operation. If you do not know what caused this error, you can try set environment variable MXNET_ENGINE_TYPE to NaiveEngine and run with debugger (i.e. gdb). This will force all operations to be synchronous and backtrace will give you the series of calls that lead to this error. Remember to set MXNET_ENGINE_TYPE back to empty after debugging.

Stack trace returned 7 entries:
[bt] (0) 0   libmxnet.so                         0x00000001077c51f8 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x00000001085bc30b _ZN5mxnet6engine14ThreadedEngine15ExecuteOprBlockENS_10RunContextEPNS0_8OprBlockE + 1115
[bt] (2) 2   libmxnet.so                         0x00000001085befc1 _ZNSt3__110__function6__funcIZZN5mxnet6engine23ThreadedEnginePerDevice13PushToExecuteEPNS3_8OprBlockEbENKUlvE_clEvEUlvE_NS_9allocatorIS8_EEFvvEEclEv + 65
[bt] (3) 3   libmxnet.so                         0x00000001085bc728 _ZNSt3__114__thread_proxyINS_5tupleIJNS_8functionIFvvEEEEEEEEPvS6_ + 104
[bt] (4) 4   libsystem_pthread.dylib             0x00007fff8ccfa9af _pthread_body + 180
[bt] (5) 5   libsystem_pthread.dylib             0x00007fff8ccfa8fb _pthread_body + 0
[bt] (6) 6   libsystem_pthread.dylib             0x00007fff8ccfa101 thread_start + 13

Abort trap: 6
Bug

Most helpful comment

Current hack (made compatible with 0.10.0):


def ttake( x, i ):
    """ Take from axis 1 instead of 0.
    """
    a = mx.sym.swapaxes(x, dim1=0, dim2=1)
    return mx.sym.flatten( mx.sym.transpose( mx.sym.take( a , i ) ) )

All 3 comments

Current hack (made compatible with 0.10.0):


def ttake( x, i ):
    """ Take from axis 1 instead of 0.
    """
    a = mx.sym.swapaxes(x, dim1=0, dim2=1)
    return mx.sym.flatten( mx.sym.transpose( mx.sym.take( a , i ) ) )

@dmadeka I was trying to debug this issue, but couldn't reproduce it using the latest master branch code. My script is as the following. Could you please confirm whether the problem still exists? From what you described, this is more like an issue of the graph executor assigning wrong req to the input of the transpose operator.

import mxnet as mx


def ttake(x, i):
    """Take from axis 1 instead of 0."""
    return mx.sym.flatten(mx.sym.transpose(mx.sym.take(mx.sym.transpose(x), i)))


shape = (4, 5)
data = mx.sym.Variable('data', shape=(4, 5))
indices = mx.sym.Variable('indices', shape=(1,))
sym = ttake(data, indices)
exe = sym.simple_bind(ctx=mx.cpu(), grad_req={'data': 'write', 'indices': 'null'})
exe.arg_dict[data.name][:] = mx.nd.arange(shape[0] * shape[1]).reshape(shape)
exe.arg_dict[indices.name][:] = mx.nd.array([0], dtype='int32')
print(exe.arg_dict[data.name])
output = exe.forward()[0]
exe.backward(output)
print(exe.grad_arrays[0].asnumpy())

Issue not reproducible with MXNet 1.3.1, Python 3.6.5 on Mac OS. Tried to replicate the issue with @reminisce's script.

@sandeep-krishnamurthy Please close this issue as it is not reproducible on latest code.

@dmadeka Please feel free to reopen if this issue is seen again.

Was this page helpful?
0 / 5 - 0 ratings