first of all, thank very much for afford such powerful tool
I'm try to combine NCE loss and bucketing LSTM in Python for few days. For now, the network is build successful and run without complain, the loss decreasing as batch feeding(very slow), and the PPL test on result model is very huge, so, I wanna to debug the model step by step, print weights and each layer's forward/backward tensor, to find where is the problem.
what's I have done yet:
after that, I do have all weights and each layers's forward/backward tensor saved in files.
but while analysis those file, I found that some tensor are work as defined in net symbol, but some are not, what I have found are:
so, is the underlay graph executor have optimize the symbol graph, and change the behavior of some layer?
thanks for and advices
blows are my environment:
Author: ziheng <[email protected]>
Date: Sun Jun 4 17:43:47 2017 -0700
A trick used by me is to create a "debug" custom op. It simply duplicate the input to output. With it, you can simply access to all the internal outputs of a network.
@winstywang, thanks, could you tell more detail about your method?
Sth I used before. FYI.
class Debug(mx.operator.CustomOp):
def forward(self, is_train, req, in_data, out_data, aux):
f = open('s.txt','w')
value = in_data[0].asnumpy()
for i in range(value.size):
f.write('%1.6f\n'%(value.flat[i]))
f.close()
#x = in_data[0].asnumpy()
#for i in range(144):
# print(x[i])
self.assign(out_data[0],req[0], in_data[0])
def backward(self, req, out_grad, in_data, out_data, in_grad, aux):
self.assign(in_grad[0],req[0], out_grad[0])
I think I have found the problem, as the monitor.tic() is call before forward call, but collect result after backward, update and update_metric call, which may change the tensor compute while forward. I have change to collect result after each call, the tensor flow as defined now.
but backward gradients still confuse me, I'm trying to understand them....
@Godricly Can you print the name of Debug Symbol?
And can you show me the DebugProp class' code? I think I get a problem in it.
I have tried to code debug in C++, but it crash in KNL machine.
@BiranLi Just follow the custom op introduction in python.
@Godricly Just drop the label_shape is OK?
@BiranLi Y do you need label shape? This op acts like a shortcut in resnet. You can write a simple testing code.
@Godricly I thought of it as a all-pass layer. This layer does not need the label param. I will try it. Thx.
its the same thing. forget that dimension match case.
2017-06-23 8:54 GMT+08:00 BiranLi notifications@github.com:
@Godricly https://github.com/godricly I thought of it as a all-pass
layer. This layer does not need the label param. I will try it. Thx.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dmlc/mxnet/issues/6692#issuecomment-310541070, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AQwQvxV7eQH7_UqJwMloDl2hIMOE4E6tks5sGwzjgaJpZM4N5t9U
.
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!
@Godricly thanks for your advice. But when I use the code, It has error: Check failed: reinterpret_cast
Most helpful comment
Sth I used before. FYI.