Machinelearning: Mismatch in Label expectations for Multiclass Learners

Created on 19 Feb 2019  路  7Comments  路  Source: dotnet/machinelearning

When training a multiclass classifier, the label must be converted to KeyType. Some learners, such as SDCA, do the conversion from R4 automatically, while some, like LogisticRegression do not. This behavior is confusing from an end-user perspective, as one cannot simply interchange learners without modifying the input pipeline.

I suggest adding auto-label-conversion for all multiclass learners in ML.NET.

bug usability

Most helpful comment

SDCA in fact does not convert from float automatically. It was instead written at a time years ago when the only type of label supported was float. Now that we have a more specific type we should shift to using it. It does not convert at all, it just supports both types.

If you prefer it to be consistent, it should be consistent in the other direction, that is, SDCA should stop doing its internal magic. That is, @Ivanidzo4ka's interpretation is correct.

We do not do "auto-conversion" for anything in ML.NET because (1) conversion is the job of transforms and (2) adding the smarts "anywhere" requires that we add them "everywhere," which is a maintainability nightmare. (In fact, it has proven to be practically impossible.)

We have an internal style guide that treats on this subject at some length. Well intentioned but utterly misguided people often think, "hey I know, I'll be super helpful and just support a bunch of types," but all it leads to is an inconsistent, broken feeling API. It is not precisely the same situation, since that deals with IDataView implementations, but the same philosophy and the same dangers of inconsistency (which is the subject of your issue!!) arise just the same.

All 7 comments

I can be wrong, but it looks like our decision making regarding auto-working-magically workflow for API is quite opposite. We no longer have auto cache, no auto normalization.
Maybe we should remove auto conversion to R4 as well from SDCA.

I have list of things I want to do to all learners https://github.com/dotnet/machinelearning/issues/2613 if you during your discovery find something which doesn't fit into your list, can you extend it?

SDCA in fact does not convert from float automatically. It was instead written at a time years ago when the only type of label supported was float. Now that we have a more specific type we should shift to using it. It does not convert at all, it just supports both types.

If you prefer it to be consistent, it should be consistent in the other direction, that is, SDCA should stop doing its internal magic. That is, @Ivanidzo4ka's interpretation is correct.

We do not do "auto-conversion" for anything in ML.NET because (1) conversion is the job of transforms and (2) adding the smarts "anywhere" requires that we add them "everywhere," which is a maintainability nightmare. (In fact, it has proven to be practically impossible.)

We have an internal style guide that treats on this subject at some length. Well intentioned but utterly misguided people often think, "hey I know, I'll be super helpful and just support a bunch of types," but all it leads to is an inconsistent, broken feeling API. It is not precisely the same situation, since that deals with IDataView implementations, but the same philosophy and the same dangers of inconsistency (which is the subject of your issue!!) arise just the same.

I like having consistency. Let's consider this a bug on SDCA for supporting floats instead of only KeyTypes as multiclass labels.

Just to make it clear, are we fixing this now or post v1?

We should fix now, as it breaks a key training scenario that any trainer for a task can substituted in a training pipeline.

Correct, we must fix now.

Was this page helpful?
0 / 5 - 0 ratings