Machinelearning: MissingValueHandlingTransformer -- do you ever plan to make it an estimator?

Created on 1 Feb 2019  路  6Comments  路  Source: dotnet/machinelearning

I see MissingValueIndicatorEstimator and MissingValueReplacingEstimator have been created. Do you currently plan to make an estimator for MissingValueHandlingTransformer?

API P2

Most helpful comment

Right now it's public class, but we under process of sweeping everything under the rug. And it will become internal in 0.11. I understand it's a nice thing to have, but in same time, it's functionality can be accessible via already existing functionality. And by making it public we owe to have documentations/ samples and so on. Considering conversion time, time to make sample, time to add proper documentation, it's at least three days of work. Which we don't have right now.

Post v1.0, we can discuss it. If all above will magically appear in form of PR, I would review it, but I don't think it wise to spend time on it right now.

All 6 comments

Can I ask you why you asking? From what I remember it's just meta transformer which combines Value replacing and ValueIndicator together and sprinkle with some extra transforms.

Definitely -- +1 that it's just a shorthand. I find myself duplicating the logic inside MissingValueHandlingTransformer in my own code, and was just curious if there were any plans to make this transform class into an estimator

In general I would recommend users to add both NA imputing & NA indicating, as this tends to win on accuracy.

We should expose the more simple form which does all the steps for the user. This reduces the ability of the user to set it up incorrectly and encourages them to use the better option right away without the push back of having to setup a multi-step solution.

We done conversion to estimators/transformers about two releases ago, I doubt we have time for that now. Post v1.0 maybe. AFAIK we didn't like idea of having transforms which just combinations of other transforms.

Having meta-transforms is quite valuable for the user. Mainly for the simplicity & encouragement to do right thing, arguments above.

Right now it's public class, but we under process of sweeping everything under the rug. And it will become internal in 0.11. I understand it's a nice thing to have, but in same time, it's functionality can be accessible via already existing functionality. And by making it public we owe to have documentations/ samples and so on. Considering conversion time, time to make sample, time to add proper documentation, it's at least three days of work. Which we don't have right now.

Post v1.0, we can discuss it. If all above will magically appear in form of PR, I would review it, but I don't think it wise to spend time on it right now.

Was this page helpful?
0 / 5 - 0 ratings