Incubator-mxnet: Support SVM output layer

Created on 9 Apr 2016  路  12Comments  路  Source: apache/incubator-mxnet

Hi all,
I found this really interesting article and I think it would be great if someone will implement this feature as an alternative to SoftmaxOutput!
http://arxiv.org/pdf/1306.0239.pdf

Call for Contribution Feature request

Most helpful comment

Do you mean this unittest? https://github.com/dmlc/mxnet/blob/master/tests/python/unittest/test_operator.py#L203

Basically, we can test operator output with Numpy implementation.

All 12 comments

I am interested

As far as I understood, this article is suggesting the usage of SVM as alternative of SoftmaxOutput using L2-SVM as cost function. They claim the results are better in this setup.

I'll just start implement this SVM neuron based on what's already done with SoftMaxOutput.

Wonderful!

I'm having a great deal of trouble trying figuring out where the SoftmaxOutput tests are.

@antinucleon, could help on this one?

Do you mean this unittest? https://github.com/dmlc/mxnet/blob/master/tests/python/unittest/test_operator.py#L203

Basically, we can test operator output with Numpy implementation.

Hello,

Is there any parameter other then margin size, regularization coefficient? I'm letting the user decide between L1-SVM and L2-SVM too, so it makes optional 3 parameters besides data and labels.

I made a toy test based on the gtc_tutorial MNIST example and had some bad time because NDArrayIter expects by default some "softmax_label". I worked around calling mx.io._init_data and setting the default_name="svm_label", but I'm sure I shouldn't be calling a underlined function. It looks like a _bug_ ;)

Right now I'm having poor results with my toy test, 7.01%, I believe that's because I should be training a SVM model. I got a bit confused whether K on the derivatives equations are SVM weights or node weight, after this experiment I'm pretty convinced they are really for SVM.

I may need to create two new Tensor operators, L1_SVM and L2_SVM, just like Softmax did. For now I'll just release the cpu version, but I'll be glad to try the gpu, I just don't have any experience, yet.

@tmatas @tqchen

Great news! I ran the MNIST toy test on both Softmax and SVM with the same configuration for both and got 97.93% for Softmax against SVM's 98.15 % with L2-SVM cost function and 97.97% with L1-SVM. It looks like the paper was right!

I'll format my repo for a pull request, if you guys my parameters suggestion

Will my toy test be any useful for you?

@jonasrla @tmatas It seems that we can combine Relu activation and MakeLoss to get the hinge loss. max(0, 1 - ty) can be written as relu(1 - ty). And we can put a mx.sym.MakeLoss after the activation.

@tmatas How is gpu implementation making, thanks.

Believe it or not I'm back and finally I have a proper env with CUDA.
Just to be sure, right now it's about rewriting the SVMOutpt-inl.h this time using MakeLoss and Relu, right?
Does that solves the problem for cpu and gpu? If so, I'll test both for both cases just to be sure.

@sxjscience, I was considering your idea of basing my solution on MakeLoss and Relu, but, with the API as is, it's unfeasible to make a solution using just operators. The sign varies depending if it's a expected answer or not and since we don't have one_hot_encode implemented on C++, it is not possible to write it using just already implemented operators. I'll keep trying pure CUDA solutions

cpu and mshadow gpu add now.

Was this page helpful?
0 / 5 - 0 ratings