Machinelearning: Mismatch in Label expectations for Multiclass Learners

Created on 19 Feb 2019 · 7Comments · Source: dotnet/machinelearning

When training a multiclass classifier, the label must be converted to KeyType. Some learners, such as SDCA, do the conversion from R4 automatically, while some, like LogisticRegression do not. This behavior is confusing from an end-user perspective, as one cannot simply interchange learners without modifying the input pipeline.

I suggest adding auto-label-conversion for all multiclass learners in ML.NET.

bug usability

Source

rogancarr

Most helpful comment

SDCA in fact does not convert from float automatically. It was instead written at a time years ago when the only type of label supported was float. Now that we have a more specific type we should shift to using it. It does not convert at all, it just supports both types.

If you prefer it to be consistent, it should be consistent in the other direction, that is, SDCA should stop doing its internal magic. That is, @Ivanidzo4ka's interpretation is correct.

We do not do "auto-conversion" for anything in ML.NET because (1) conversion is the job of transforms and (2) adding the smarts "anywhere" requires that we add them "everywhere," which is a maintainability nightmare. (In fact, it has proven to be practically impossible.)

We have an internal style guide that treats on this subject at some length. Well intentioned but utterly misguided people often think, "hey I know, I'll be super helpful and just support a bunch of types," but all it leads to is an inconsistent, broken feeling API. It is not precisely the same situation, since that deals with IDataView implementations, but the same philosophy and the same dangers of inconsistency (which is the subject of your issue!!) arise just the same.

TomFinley on 19 Feb 2019

👍2

All 7 comments

I can be wrong, but it looks like our decision making regarding auto-working-magically workflow for API is quite opposite. We no longer have auto cache, no auto normalization.
Maybe we should remove auto conversion to R4 as well from SDCA.

Ivanidzo4ka on 19 Feb 2019

👍1

I have list of things I want to do to all learners https://github.com/dotnet/machinelearning/issues/2613 if you during your discovery find something which doesn't fit into your list, can you extend it?

Ivanidzo4ka on 19 Feb 2019

If you prefer it to be consistent, it should be consistent in the other direction, that is, SDCA should stop doing its internal magic. That is, @Ivanidzo4ka's interpretation is correct.

TomFinley on 19 Feb 2019

👍2

I like having consistency. Let's consider this a bug on SDCA for supporting floats instead of only KeyTypes as multiclass labels.

rogancarr on 20 Feb 2019

Just to make it clear, are we fixing this now or post v1?

Ivanidzo4ka on 27 Feb 2019

We should fix now, as it breaks a key training scenario that any trainer for a task can substituted in a training pipeline.

rogancarr on 27 Feb 2019

👍1

Correct, we must fix now.

TomFinley on 27 Feb 2019

Was this page helpful?