Incubator-mxnet: Support SVM output layer

Created on 9 Apr 2016 · 12Comments · Source: apache/incubator-mxnet

Hi all,
I found this really interesting article and I think it would be great if someone will implement this feature as an alternative to SoftmaxOutput!
http://arxiv.org/pdf/1306.0239.pdf

Call for Contribution Feature request

Source

tmatas

Most helpful comment

Do you mean this unittest? https://github.com/dmlc/mxnet/blob/master/tests/python/unittest/test_operator.py#L203

Basically, we can test operator output with Numpy implementation.

antinucleon on 31 May 2016

👍3

All 12 comments

I am interested

jonasrla on 25 May 2016

👍3

As far as I understood, this article is suggesting the usage of SVM as alternative of SoftmaxOutput using L2-SVM as cost function. They claim the results are better in this setup.

I'll just start implement this SVM neuron based on what's already done with SoftMaxOutput.

jonasrla on 28 May 2016

Wonderful!

tmatas on 30 May 2016

I'm having a great deal of trouble trying figuring out where the SoftmaxOutput tests are.

@antinucleon, could help on this one?

jonasrla on 31 May 2016

Do you mean this unittest? https://github.com/dmlc/mxnet/blob/master/tests/python/unittest/test_operator.py#L203

Basically, we can test operator output with Numpy implementation.

antinucleon on 31 May 2016

👍3

Hello,

Is there any parameter other then margin size, regularization coefficient? I'm letting the user decide between L1-SVM and L2-SVM too, so it makes optional 3 parameters besides data and labels.

I made a toy test based on the gtc_tutorial MNIST example and had some bad time because NDArrayIter expects by default some "softmax_label". I worked around calling mx.io._init_data and setting the default_name="svm_label", but I'm sure I shouldn't be calling a underlined function. It looks like a _bug_ ;)

Right now I'm having poor results with my toy test, 7.01%, I believe that's because I should be training a SVM model. I got a bit confused whether K on the derivatives equations are SVM weights or node weight, after this experiment I'm pretty convinced they are really for SVM.

I may need to create two new Tensor operators, L1_SVM and L2_SVM, just like Softmax did. For now I'll just release the cpu version, but I'll be glad to try the gpu, I just don't have any experience, yet.

jonasrla on 24 Jun 2016

@tmatas @tqchen

Great news! I ran the MNIST toy test on both Softmax and SVM with the same configuration for both and got 97.93% for Softmax against SVM's 98.15 % with L2-SVM cost function and 97.97% with L1-SVM. It looks like the paper was right!

I'll format my repo for a pull request, if you guys my parameters suggestion

Will my toy test be any useful for you?

jonasrla on 24 Jun 2016

@jonasrla @tmatas It seems that we can combine Relu activation and MakeLoss to get the hinge loss. max(0, 1 - ty) can be written as relu(1 - ty). And we can put a mx.sym.MakeLoss after the activation.

sxjscience on 22 Jul 2016

@tmatas How is gpu implementation making, thanks.

Trangle on 19 Aug 2016

Believe it or not I'm back and finally I have a proper env with CUDA.
Just to be sure, right now it's about rewriting the SVMOutpt-inl.h this time using MakeLoss and Relu, right?
Does that solves the problem for cpu and gpu? If so, I'll test both for both cases just to be sure.

jonasrla on 26 Aug 2016

@sxjscience, I was considering your idea of basing my solution on MakeLoss and Relu, but, with the API as is, it's unfeasible to make a solution using just operators. The sign varies depending if it's a expected answer or not and since we don't have one_hot_encode implemented on C++, it is not possible to write it using just already implemented operators. I'll keep trying pure CUDA solutions

jonasrla on 31 Aug 2016

👍1

cpu and mshadow gpu add now.