Hi all,
I found this really interesting article and I think it would be great if someone will implement this feature as an alternative to SoftmaxOutput!
http://arxiv.org/pdf/1306.0239.pdf
I am interested
As far as I understood, this article is suggesting the usage of SVM as alternative of SoftmaxOutput using L2-SVM as cost function. They claim the results are better in this setup.
I'll just start implement this SVM neuron based on what's already done with SoftMaxOutput.
Wonderful!
I'm having a great deal of trouble trying figuring out where the SoftmaxOutput tests are.
@antinucleon, could help on this one?
Do you mean this unittest? https://github.com/dmlc/mxnet/blob/master/tests/python/unittest/test_operator.py#L203
Basically, we can test operator output with Numpy implementation.
Hello,
Is there any parameter other then margin size, regularization coefficient? I'm letting the user decide between L1-SVM and L2-SVM too, so it makes optional 3 parameters besides data and labels.
I made a toy test based on the gtc_tutorial MNIST example and had some bad time because NDArrayIter expects by default some "softmax_label". I worked around calling mx.io._init_data and setting the default_name="svm_label", but I'm sure I shouldn't be calling a underlined function. It looks like a _bug_ ;)
Right now I'm having poor results with my toy test, 7.01%, I believe that's because I should be training a SVM model. I got a bit confused whether K on the derivatives equations are SVM weights or node weight, after this experiment I'm pretty convinced they are really for SVM.
I may need to create two new Tensor operators, L1_SVM and L2_SVM, just like Softmax did. For now I'll just release the cpu version, but I'll be glad to try the gpu, I just don't have any experience, yet.
@tmatas @tqchen
Great news! I ran the MNIST toy test on both Softmax and SVM with the same configuration for both and got 97.93% for Softmax against SVM's 98.15 % with L2-SVM cost function and 97.97% with L1-SVM. It looks like the paper was right!
I'll format my repo for a pull request, if you guys my parameters suggestion
Will my toy test be any useful for you?
@jonasrla @tmatas It seems that we can combine Relu activation and MakeLoss to get the hinge loss. max(0, 1 - ty) can be written as relu(1 - ty). And we can put a mx.sym.MakeLoss after the activation.
@tmatas How is gpu implementation making, thanks.
Believe it or not I'm back and finally I have a proper env with CUDA.
Just to be sure, right now it's about rewriting the SVMOutpt-inl.h this time using MakeLoss and Relu, right?
Does that solves the problem for cpu and gpu? If so, I'll test both for both cases just to be sure.
@sxjscience, I was considering your idea of basing my solution on MakeLoss and Relu, but, with the API as is, it's unfeasible to make a solution using just operators. The sign varies depending if it's a expected answer or not and since we don't have one_hot_encode implemented on C++, it is not possible to write it using just already implemented operators. I'll keep trying pure CUDA solutions
cpu and mshadow gpu add now.
Most helpful comment
Do you mean this unittest? https://github.com/dmlc/mxnet/blob/master/tests/python/unittest/test_operator.py#L203
Basically, we can test operator output with Numpy implementation.