Hello,
I am designing a network, which, in the end, uses a weighted summation of more than one softmax outputs. So, this final summation should act as the loss layer, instead of a single softmax output. I wonder whether this is possible simply by multiplying n softmax layers with n weights repeatedly and summing them into a final summation operator, for example in a loop and then using the final summation operator as the loss layer? It is something like that in pseudocode:
summation = weight[0] * softmax[0]
for i in (1,N):
summation = summation + (weight[i]*softmax[i])
In the end "summation" should be the final loss term. I am in doubt whether this will work, since it seems that softmax output operator does not receive gradients from operators coming after it. It is designed as being the 'final layer'. Is this right? What will be the best way to implement this by using existing MxNet symbolic operators, without having to code my own operator somehow?
Thanks in advance.
Use SoftmaxActivation.
I am getting "Softmax Activation for internal layers is only supported on GPU with cuDNN. Use SoftmaxOutput for loss layer." error when I try to use it. I called the bind method on the network by using gpu as the context, but this did not help either.
I'm also stuck here. @piiswrong: Can you please provide some code sample for the weighted loss? Thanks.
You can achieve the same thing by grouping several SoftmaxLoss layers and using grad_scale for the weighting of the losses.
softmax1 = mx.SoftmaxOutput(..., grad_scale = 0.67)
softmax2 = mx.SoftmaxOutput(..., grad_scale = 0.33)
net = mx.Group(softmax1, softmax2)
Yes, you can, the weight assignment can be done in definition of previous layers
Most helpful comment
You can achieve the same thing by grouping several SoftmaxLoss layers and using
grad_scalefor the weighting of the losses.