Incubator-mxnet: Loss and output layers in MXNet

Created on 22 Feb 2017 · 6Comments · Source: apache/incubator-mxnet

I'm confused about the design of output symbols in MXNet. Consider the SoftmaxOutput symbol as an example.

When creating the SoftmaxOutput symbol, it takes only the previous layer as input.

Then on the forward pass is simply applies softmax and outputs e^x_i / sum_j(e^x_j), which make perfect sense.

But on the backward pass it computes the gradient with respect the log loss using both the previous layer and the label as input.

This is confusing because:
1) the forward/backward pass of a node in a computational graph should typically be based on the same computation.
2) It is not clear when/how the layer get wired up to the labels? Does the model/module automatically pass the labels to the last layer of a computational graph?
3) With this design, the computational graph never actually outputs the loss. If I want to access the loss, how is one supposed to do it?
4) The MakeLoss symbol does not work this way. It causes the graph to output the actual loss.

Thank you for explaining the design.

Source

aaron-cognitiv

Most helpful comment

Loss layers will be refactored soon to look more like keras loss

piiswrong on 23 Feb 2017

👍6

All 6 comments

Loss layers will be refactored soon to look more like keras loss

piiswrong on 23 Feb 2017

👍6

On a slightly related note, it would be nice to have the axis over which softmax is done be a parameter of the Softmax symbols and not to force batch-majority of input data.

fhieber on 23 Feb 2017

I'm actually also interested to access the output of the loss. Indeed I want to implement a weighted loss function (ie like in the U-Net http://lmb.informatik.uni-freiburg.de/resources/opensource/unet.en.html)

What would be the best way to achieve that?

My guess would be to have two outputs: one for the predictions (before loss layer) and one for the loss (after the loss, obviously). Please could you confirm whether I am correct?