Hi all,
I'm new to MXNet. Is there any example showing how to use cross entroy as the loss function?
Thx.
usually you can use SoftmaxOutput instead, which is shown in most of the image classification example. softmax_cross_entropy is used in https://github.com/dmlc/mxnet/blob/master/example/model-parallel-lstm/lstm.py#L107 , which is a bit complicated example.
The difference is in forward pass, SoftmaxOutput outputs the normalized probability, while softmax_cross_entropy outputs the loss function value as a scalar.
@tqchen In Multi-label classfication, Cross-Entropy loss is very useful, so I also wondering if there is a cross-entropy loss in mxnet
@tqchen Maybe there is also difference in the backward pass. If I were right, SoftmaxOutput never computes cross entropy, but use 1 for all training samples in the backward pass. I figure this behavior is intended, since the Softmax implemented as a python operator does the same. But, is this supported theoretically or by empirical experiments?
@WuZifeng It's the gradient of the cross-entropy loss used in classification problems. Refer to http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
@sxjscience Thanks very much! It was my fault. I thought the gradient were in a different form. Apologies for bothering you.
@tqchen I am not quit understand who to implement loss value. I am a new for MXNET. I am working on distributed version example, and everything is running. I want to calculate the lost value for each number of epochs. How can I do that ? I need some suggestions.
Sincerely
Most helpful comment
usually you can use SoftmaxOutput instead, which is shown in most of the image classification example. softmax_cross_entropy is used in https://github.com/dmlc/mxnet/blob/master/example/model-parallel-lstm/lstm.py#L107 , which is a bit complicated example.
The difference is in forward pass, SoftmaxOutput outputs the normalized probability, while softmax_cross_entropy outputs the loss function value as a scalar.