Scikit-learn: Relevance Vector Machine (RVM)

Created on 3 Jan 2013 · 68Comments · Source: scikit-learn/scikit-learn

RVM is a Bayesian framework for obtaining sparse solutions to regression and classification tasks. It used a model of identical form to SVM ( Support Vector Machine). It solves the following disadvantages of SVM:
-- The number of basis function in SVM grows linearly with the size of the training set
In RVM, we start with 0 basis and incrementally update (add/delete) the set of basis function until convergence.

-- SVM predictions are not probabilistic while RVM's are probabilistic

-- It is necessary in SVM to estimate the margin trade-off parameter 'C' which is not the case in RVM

-- SVM kernel should be positive definite. In RVM we can use any kernel.

It is already implemented in dlib http://dlib.net/dlib/svm/rvm_abstract.h.html and there is also a matlab implementation here http://www.vectoranomaly.com/downloads/downloads.htm. These codes should serve as a guide.
I think it will be a good idea to add it to scikit-learn.

References :
1- Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6.

2- Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244.

New Feature

Source

yedtoss

👍2

Most helpful comment

Hi @amueller and everybody! We saw this thread and decided to implement a sklearn-compatible version of the RVM (https://github.com/Mind-the-Pineapple/sklearn-rvm). We based a lot of what we did on the JamesRitchie implementation. It would be great if someone would be willing to take a look at it and feedbacks and contributions (@themrzmaster) are welcome :)
We have plans of maintaining this repo, implementing the fast version on version 0.2 and hope that maybe some day this code could be helpful for scikit-learn main repository 😊 🍍

PedroFerreiradaCosta on 24 Jan 2020

❤5

All 68 comments

I'd have to read up on it again but in general I think RVMs would be a good addition.
dlib is boost licensed which should be compatible. It might not be so easy to wrap because of the boost-heavy coding style, though.
Is the problem optimized using SMO? Is it sensible if we implement SMO?

Gotta grab my Bishop.

amueller on 3 Jan 2013

What is the relation between ARD and RVM? Is RVM just the "basis function" version of ARD?

amueller on 3 Jan 2013

Btw is anyone ever bothered by the fact that the section Generalized Linear Models doesn't contain any generalized models?

amueller on 3 Jan 2013

Ok so we should use the sequential sparse learning algorithm Bishop p. 352 following I guess?
Know yourself out ;)

amueller on 3 Jan 2013

I wonder whether there is a similar method for ARD? That would be cool as the current ARD implementation is quite slow :-/

amueller on 3 Jan 2013

No the implementation of RVM definitely does not use SMO. I think SMO is only used for SVM optimization.
Yes we should use the sequential sparse learning algorithm in reference 1 page 7. ( is it in bishop p 352? which one). This algorithm is "simple" enough and we can write it without using dlib. I was thinking about writing it in python and then use cython for optimization. In this case we can take full advantage of the matlab implementation. What do you think?
Anyway, It should be possible to write in C++. But for that we will need a good linear algebra library in C++. I am not sure if by default scikit-learn comes with one.

yedtoss on 4 Jan 2013

Bishop is "Machine Learning and Pattern Recognition".

The algorithm is probably not completely easy to get right. If you want to start on this, it will definitely be a bigger project. If you are interested and want to implement it any way, go ahead.

Implementing it is also quite a bit different from getting it into scikit-learn. That also involves writing tests, documentation and a user guide - and pretty code.

For a first try you should definitely use just numpy. It uses blas internally and is therefore quite fast.
Speeding up using Cython only makes sense if there is a lot of python overhead. If all the time is spent in the BLAS calls, using Cython doesn't make much sense.

amueller on 4 Jan 2013

❤1

OK for Cython and numpy. I didn't know bishop talk about RVM.
For the ARD and RVM relationship. I don't know a lot about ARD. But in reference 2 the authors said that RVM is based on ARD : "We term those training vectors associated with the remaining non-zero weights 'relevance' vectors, in deference to the principle of automatic relevance determination which motivates the presented approach" Page 3 (213) line 8.
Anyway how does ARD works?

yedtoss on 4 Jan 2013

ARD is also explained in Bishops book and in the user guide. It puts a diagonal Gaussian prior on the weights and tries to estimate the variances, which (as I understand it) is the same that RVM does. Is that correct?

amueller on 5 Jan 2013

I realize that the ref mentioned :

http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

is not the implementation we use. I think that @vmichel implemented the
Bishop approach while this paper purposes a fixed point approach similar to
a coordinate descent approach. This code definitely needs some love...

agramfort on 6 Jan 2013

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

amueller on 6 Jan 2013

(btw of of the slowest things in the test suite right now is fitting ARD on the boston dataset in the common tests)

amueller on 6 Jan 2013

Thanks @agramfort, I was wondering about that. I didn't go through the details but I thought as the paper was the only reference...

I would very much appreciate it if you could add a comment there citing the chapter of bishop that was used an maybe saying we should implement the other paper instead.

see : https://github.com/scikit-learn/scikit-learn/pull/1530

agramfort on 7 Jan 2013

Btw is anyone ever bothered by the fact that the section Generalized Linear
Models doesn't contain any generalized models?

It does: logistic regressions.

GaelVaroquaux on 9 Jan 2013

I wonder whether there is a similar method for ARD? That would be cool as the
current ARD implementation is quite slow :-/

I believe that the most promising rapid solver of ARD is to implement the
strategy exposed in:
http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf

GaelVaroquaux on 9 Jan 2013

I put on a gist a code wrote a while ago.

If somebody wants to work on ARD I thought it could be useful.

https://gist.github.com/4494613

WARNING : it's not much tested and I don't guarantee correctness but it
seems to work.

agramfort on 9 Jan 2013

@amueller
I have analysed ARD in http://books.nips.cc/papers/files/nips20/NIPS2007_0976.pdf and I think RVM and ARD want to optimize the same objective function. The difference appears in the method used to optimize this function. In RVM, the authors noticed that most of the weight will be near zero and they used this to derive a "fast" algorithm.

yedtoss on 16 Jan 2013

That sounds odd. If the objective is the same, you should be able to use the same methods for optimization, right?

amueller on 16 Jan 2013

👍1

Yes for sure you should, but I guess the authors of RVM used a different optimization strategy to have a faster and more sparse algorithm.

yedtoss on 16 Jan 2013

@yedtoss I'm pretty sure there is some other difference. As I said before, this might be that RVM work in a feature space or with a kernel or something. Otherwise you could just replace the ARD implementation? That is regression, though, and you want classification, right?

@agramfort do you know anything about the difference of ARD and RVM?

amueller on 16 Jan 2013

@amueller
Initially RVM is a regression technique. But the authors presented a way to use it for classification. RVM used any kind of kernel.
What I mean is the log likelihood of ARD (equation 2 of A New View of Automatic Relevance Determination) and of RVM ( equation 7 of Fast marginal likelihood maximisation for sparse Bayesian models) are identical.

yedtoss on 16 Jan 2013

I guess I'd have to read the papers to know whats going on....

amueller on 16 Jan 2013

sorry guys I am not much of a Bayesian guy... I don't know well the
subtilties...

agramfort on 16 Jan 2013

😄1

RVMs are patented by Microsoft.

larsmans on 17 Mar 2013

👀1

crazy.

amueller on 19 Mar 2013

@larsmans @amueller while there is a patent in the US for RVM, the author recommends a GPLv2 Matlab implementation on his web page, so I guess it is ok to implement it...
http://www.miketipping.com/sparsebayes.htm

Best,
Angelos

kalxas on 6 Sep 2013

@kalxas License and patent are quite orthogonal and GPLv2 in particular didn't address software patents. The rights you have with such an implementation are the intersection of the rights granted by the GPL and those granted by the patent holder.

That said, I found out in the meantime that support vector machines are patented by AT&T but the patent was apparently never enforced. If something similar can be proven of RVMs, I might change my mind about them.

larsmans on 6 Sep 2013

@larsmans I wrote a pure numpy/python port of dlib implementation (awfully slow at the moment, I'll try to cythonize it). According to the header, dlib's implem has been around since 2008 and they seem fine with it. Would you consider changing your mind about having RVM in sklearn ?

0x0L on 9 Oct 2014

Let's hear @GaelVaroquaux's opinion on this. The dlib implem doesn't show a thing as long as you can't prove it's widely used without a patent license.

larsmans on 10 Oct 2014

Are there any updates on this topic? I've been looking into RVMs lately and was wondering if there was any code around there...

jlopezpena on 30 Mar 2015

I don't think anyone has tried for a fast implementation, and we are still unsure about the legal status.

amueller on 30 Mar 2015

@jlopezpena Take a look at dlib, the code is pretty clear and it's header (templates) only. It should be fairly easy to build a C extension for use from numpy

0x0L on 6 Apr 2015

Hi everyone,

I recently translated Mike Tipping’s freely-available SparseBayes MATLAB program, which mainly implements the RVM, from MATLAB into Python. It can be found here: https://github.com/jhallock7/SparseBayes-Python . I contacted Mike Tipping, and he said that Microsoft’s patent is only for the original slow algorithm, whereas the SparseBayes program uses the faster one found here: http://www.miketipping.com/papers/met-fastsbl.pdf . So it would be fine if some form of his program was folded into scikit-learn. I am relatively new to Python, so my translation can undoubtedly be improved or modified.

jhallock7 on 7 Aug 2015

❤1

Thanks for wanting to contribute and also thanks for checking the patent status.
There is another question though, which is if this algorithms is widely useful.

I haven't looked at the algorithm and its use since I last read Bishop's book, which is a while ago.
I think an interesting example would be to show that it either gives better uncertainty then calibrating
an SVM, or that it is faster (calibrating an SVM and searching over C needs a lot of cross-validation).

amueller on 8 Aug 2015

@amueller RVM (mainly revelance vector regression (RVR)) is pretty useful in neuroimaging data analysis. Lots of papers uses this method rather than SVR to predict. It will be perfect, if this method can be added to the scikit learn toolbox.

ZaixuCui on 17 Sep 2015

👍2

@amueller I implemented slow version of RVM which can use either EM or fixed-point algorithm to fit model (mainly for learning / academic purpose) and major difference between RVM and SVR that I noted from couple of examples is sparsity i.e. number of 'support' vectors used in prediction . In many cases RVM produces results comparable with SVR with number of support vectors being only a fraction of what SVR uses
(here is one simple example that is also used in Tipping 2001)

AmazaspShumik on 29 Sep 2015

@amueller (Adding to previous comment) And obviously small number of support vectors will imply very fast prediction.
Another advantage of RVM is probabilistic treatmen. With RVM for every data point in test set you find not only point estimate but also predictive distribution

AmazaspShumik on 29 Sep 2015

does RVR provide probability distributions, too?

Sounds like RVR and RVM are reasonable candidates for inclusion. I'm not sure about the state of the art algorithms, though. Is it http://www.miketipping.com/papers/met-fastsbl.pdf ? That seems pretty old. The SparseBayes coding style is.... interesting, and I think it would serve better as a reference than as a basis for the sklearn implementation.

amueller on 14 Oct 2015

Yes RVR provides probability distributions, however in some cases variance of predictive distribution can be smaller for data points outside the domain of the training set example.
To the best of my knowledge paper you mentioned is last version of RVM it also corresponds to Matlab implementation on Tipping's website (version 2).

AmazaspShumik on 15 Oct 2015

I also found interesting comparison of RVM and SVM speed in Kevin Murphy's book:
"RVM is also fastest to train. This is despite the fact that RVM code is in Matlab and the SVM code is in C"(Chapter 14, p.490). However it seems that they made comparisons for small datasets only.

AmazaspShumik on 15 Oct 2015

@amueller RVM (mainly revelance vector regression (RVR)) is pretty useful in
neuroimaging data analysis.

I am not convinced (and I do neuroimaging). I haven't seen a good
empirical comparison.

I see the way to go is to have RVMs in a separate package, with
scikit-learn API, and encourage good empirical work to show their
usefulness. If they are useful, merge them in scikit-learn.

GaelVaroquaux on 15 Oct 2015

Some neuroimaging studies have used relevance vector regression (RVR) and
made comparisons between RVR and SVR.
To list a few:
http://www.sciencedirect.com/science/article/pii/S1053811910000108
http://www.sciencedirect.com/science/article/pii/S1053811910012644
http://www.sciencedirect.com/science/article/pii/S1053811910003459
http://www.nature.com/npp/journal/v39/n3/abs/npp2013251a.html

And, the RVR is implemented in a pattern recognition toolbox for
neuroimaging data:
http://www.mlnl.cs.ucl.ac.uk/pronto/

Hope that RVR can be incorporated in scikit-learn.

Best wishes

Zaixu

On Thu, Oct 15, 2015 at 12:57 PM, Gael Varoquaux [email protected]
wrote:

@amueller RVM (mainly revelance vector regression (RVR)) is pretty
useful in
neuroimaging data analysis.

I am not convinced (and I do neuroimaging). I haven't seen a good
empirical comparison.

I see the way to go is to have RVMs in a separate package, with
scikit-learn API, and encourage good empirical work to show their
usefulness. If they are useful, merge them in scikit-learn.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/issues/1513#issuecomment-148281784
.

ZaixuCui on 15 Oct 2015

I see the way to go is to have RVMs in a separate package, with scikit-learn API, and encourage good empirical work to show their usefulness. If they are useful, merge them in scikit-learn.

mblondel on 15 Oct 2015

👍1

I mean there is https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Relevance%20Vector%20Machine%20%26%20ARD/rvm.py which looks relatively compatible. needs set_params and get_params or inheriting from BaseEstimator.

And there is https://github.com/jhallock7/SparseBayes-Python which could be wrapped.

@ZaixuCui why do you want it to be in scikit-learn when there is a ready to use implementation out there?

I tend to agree with @GaelVaroquaux and @mblondel . If no-one published on algorithms in nearly ten years, people don't seem to be very interested. [oh, the standard algorithm is from 2003 even. but then again libsvm is 2005]

amueller on 15 Oct 2015

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab when
do machine learning analysis.

Thanks very much.
Wishes you all the best

Zaixu

On Thu, Oct 15, 2015 at 11:13 AM, Andreas Mueller [email protected]
wrote:

I mean there is
https://github.com/AmazaspShumik/Bayesian-Regression-Methods/blob/master/Relevance%20Vector%20Machine%20%26%20ARD/rvm.py
which looks relatively compatible. needs set_params and get_params or
inheriting from BaseEstimator.

And there is https://github.com/jhallock7/SparseBayes-Python which could
be wrapped.

@ZaixuCui https://github.com/ZaixuCui why do you want it to be in
scikit-learn when there is a ready to use implementation out there?

I tend to agree with @GaelVaroquaux https://github.com/GaelVaroquaux
and @mblondel https://github.com/mblondel . If no-one published on
algorithms in nearly ten years, people don't seem to be very interested.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/issues/1513#issuecomment-148440665
.

ZaixuCui on 19 Oct 2015

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab when
do machine learning analysis.

You can use the Python code doing RVR that we pointed to in the
discussion.

GaelVaroquaux on 19 Oct 2015

OK，thanks

On Mon, Oct 19, 2015 at 8:29 AM, Gael Varoquaux [email protected]
wrote:

Because I use scikit learn to do SVR, elastic-net.
So, if there is a RVR implementation, I will not need to do use matlab
when
do machine learning analysis.

You can use the Python code doing RVR that we pointed to in the
discussion.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/issues/1513#issuecomment-149213151
.

ZaixuCui on 19 Oct 2015

Couldnt we implement this as very lightweight class based on our new Gaussian Process implementation? As a far as I understand, RVR is only the name given to a GP with a special kind of kernel.

Though this would require only minimal effort, basing the implementation of RVR on the one of GP may not be the most appropriate thing to do? CC: @jmetzen

glouppe on 19 Oct 2015

❤1

@amueller @GaelVaroquaux @ZaixuCui @yedtoss @jlopezpena

Hi, I implemented fast version of Relevance Vector Machine with scikit-learn API,
so if anybody intends to use it feel free to do it.

Code: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/sklearn_bayes/rvm/fast_rvm.py

Examples: https://github.com/AmazaspShumik/sklearn_bayes/blob/master/ipython_notebooks_tutorials/rvm_ard/rvm_demo.ipynb

There are four implemented classes in the code:
-RegressionARD
-ClassificationARD
-RVC
-RVR

So may be RegressionARD and ClassificationARD can be useful as well

AmazaspShumik on 5 Feb 2016

👍3

@AmazaspShumik thank you so much for your implementation. Great work :+1:

liamnaka on 21 Feb 2016

@AmazaspShumik

Thank you for your efforts very much.
I will definitely try this package.

Wish you all the best.

Zaixu

ZaixuCui on 21 Feb 2016

👍1

Has anyone had trouble implementing @AmazaspShumik predict_proba method?

Blair-Young on 31 Jan 2017

👍1

any one here have library for RVM on php? i dont understand with RVm can explain for me?

tendiarifin on 15 Mar 2017

any one have library RVM for PHP?

tendiarifin on 20 Apr 2017

RVMs are patented by Microsoft.

Patent will be expired soon

2019-09-04
Anticipated expiration

a-n-ermakov on 22 Apr 2019

Some pointed in discussion links to @AmazaspShumik implementation are broken, just put them here for people who interested in (and some other implementations):

https://github.com/AmazaspShumik/sklearn_bayes - RVM + some other algs implementation
https://github.com/JamesRitchie/scikit-rvm - simple implementations using scipy
https://github.com/siavashserver/neonrvm - C implementation with Python binding

Also, here is relevant papers collection:
http://www.miketipping.com/sparsebayes.htm

a-n-ermakov on 22 Apr 2019

There is a C++ implementation of RVM in there (supplementary material of a paper):
https://pubs.acs.org/doi/abs/10.1021/ci200128w
Precisely here:
https://pubs.acs.org/doi/suppl/10.1021/ci200128w/suppl_file/ci200128w_si_001.zip

UnixJunkie on 10 Jun 2019

Microsoft patent has expired. Could we possibly add it to sklearn?

themrzmaster on 12 Nov 2019

It easily clears the standard requirements, so I don't see why not. Maybe having some good/convincing examples might be interesting. scikit-rvm and sklearn_bayes seem unmaintained but might still be useful.
Probably mostly needs a champion now that actually wants to put the work in.

amueller on 12 Nov 2019

On Murphy's book he claims the RVMs performance is really similar to SVMs but has the advantage of being a true probabilistic method so it gives calibrated probabilities as answers. Here https://github.com/probml/pmtk3/blob/master/docs/tutorial/html/tutKernelClassif.html he compared the methods using a small dataset

themrzmaster on 12 Nov 2019

Can you provide a link to the rendered version?
Also, itn's not surprising that it's on a small dataset, given that is is likely to have scalability issues.

amueller on 12 Nov 2019

@amueller http://htmlpreview.github.io/?https://github.com/probml/pmtk3/blob/master/docs/tutorial/html/tutKernelClassif.html

themrzmaster on 12 Nov 2019

I will try to work on the implementation. Any help would be highly appreciated.

themrzmaster on 12 Nov 2019

IIRC, one advantage of RVM over SVM, is that you can find the optimal C parameter without doing an optimization pass.
I wonder if the original author would be willing to contribute his reference implementation.
Well, it is in Matlab and Mike Tipping is not even on github...

UnixJunkie on 13 Nov 2019

PedroFerreiradaCosta on 24 Jan 2020

❤5

Hi @amueller and everybody! We saw this thread and decided to implement a sklearn-compatible version of the RVM (https://github.com/Mind-the-Pineapple/sklearn-rvm). We based a lot of what we did on the JamesRitchie implementation. It would be great if someone would be willing to take a look at it and feedbacks and contributions (@themrzmaster) are welcome :)
We have plans of maintaining this repo, implementing the fast version on version 0.2 and hope that maybe some day this code could be helpful for scikit-learn main repository 😊 🍍

Hi @PedroFerreiradaCosta
I have tested this scikit-learn api and it seems it is still so slow (seems not responding yet). Do you think the reason could be it is implemented on Windows? Below is what I used:
EMRVC(kernel='rbf', gamma='scale', n_iter_posterior=10, max_iter=500, compute_score=True , verbose=True ) Thanks for your answer @themrzmaster @PedroFerreiradaCosta

mustuner on 3 May 2020

👍1

Hi @mustuner ! Thank you for trying out our API!
The RVM does have a higher computational complexity than for example the SVM (O(M^3)), which might make it slower for cases where you have a large number of basis functions. Either way, there are a number of things that you can do to speed up the process. You can pre-compute the kernel matrix and feed it to our algorithm instead of X, (set kernel='precomputed') or you can also decrease the scale of the alpha_threshold (set by default at 1e5). Please have in mind that this second option might lead to lower accuracy of the model.
Hope this helps! And feel free to open an issue for us to help further.

Best,
Pedro

PedroFerreiradaCosta on 10 May 2020

👍1

Hi @mustuner ! Thank you for trying out our API!
The RVM does have a higher computational complexity than for example the SVM (O(M^3)), which might make it slower for cases where you have a large number of basis functions. Either way, there are a number of things that you can do to speed up the process. You can pre-compute the kernel matrix and feed it to our algorithm instead of X, (set kernel='precomputed') or you can also decrease the scale of the alpha_threshold (set by default at 1e5). Please have in mind that this second option might lead to lower accuracy of the model.
Hope this helps! And feel free to open an issue for us to help further.

Best,
Pedro

Thank you @PedroFerreiradaCosta Let me try that

mustuner on 18 May 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings