Machinelearning: How to use LinearSvm?

Created on 19 Nov 2018  路  3Comments  路  Source: dotnet/machinelearning

Sorry if this is not the place for questions.

Currently I'm using FastTree for binary classification, but I would like to give SVM a try and compare metrics.

All the docs mention LinearSvm, but I can't find code example anywhere.

mlContext.BinaryClassification.Trainers does not have public SVM trainers. There is LinearSvm class and LinearSvm.TrainLinearSvm static method, but they seem to be intended for different things.

What am I missing?

Version: 0.7

API bug

All 3 comments

Hi @maxt3r,

This is a great place for a question -- thanks for reaching out!

I have two answers for you: What the status of the API is, and how to use the LinearSVM in the meantime.

First, we have LinearSVM in the ML.NET codebase, but we do not yet have samples or the API extensions to place it in mlContext.BinaryClassification.Trainers. This is being worked through in issue #1318. I'll link this to that issue, and mark it as a bug.

In the meantime, you can use direct instantiation to get access to LinearSVM:

var arguments = new LinearSvm.Arguments()
{
    NumIterations = 20
};
var linearSvm = new LinearSvm(mlContext, arguments);
var svmTransformer = linearSvm.Fit(trainSet);
var scoredTest = svmTransformer.Transform(testSet);

This will give you an ITransformer, here called svmTransformer that you can use to operate on IDataView objects.

One note about our LinearSVM implementation: This is an implementation PEGASOS, which is an online implementation of SVM. See this paper for more information.


Instance of Issue #1318

Thanks a lot for quick response. Feel free to close the issue.

Hi @maxt3r,

I'll close the issue, but one more comment: If you want a good linear baseline to judge other solutions from, I'd suggest using SDCA rather than LinearSVM. You can reference it like mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent. In general**, it's the fastest and best "out-of-the-box" linear solver we have.

** Of course, the specific performances will depend on the dataset, but this is the "best bet" solution.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

pgovind picture pgovind  路  3Comments

rogancarr picture rogancarr  路  4Comments

darren-zdc picture darren-zdc  路  3Comments

frankhaugen picture frankhaugen  路  3Comments

ddobric picture ddobric  路  4Comments