Machinelearning: It is not possible to use PermutationFeatureImportance from a model loaded from disk in F#

Created on 9 Jul 2019 · 14Comments · Source: dotnet/machinelearning

I am trying to use PermutationFeatureImportance (PFI) with F# but the F# type system is not resolving ITransformer to ISingleFeaturePredictionTransformer - which is required by PFI.

I believe it is due to IPredictorProducing (and related interfaces) being marked as "internal".

F# supports explicit interfaces and maybe that is the reason for this issue.

Here is a snippet of code that shows what I am trying to do
(I am using the latest bits - v 1.2.0 at the time of this post)

F# let mutable schema = null let mdl = ctx.Model.Load(@"F:\fwaris\data\t\analysis\model_cv_LightGbmBinary.bin", &schema) let mdlt = mdl :?> TransformerChain<ITransformer> let m1 = mdlt.LastTransformer //debugger shows it is Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.IPredictorProducing<float>> let scored = mdl.Transform(trainView) scored.Preview() ctx.BinaryClassification.PermutationFeatureImportance(m1 :?> _,scored)

@dsyme

F# P1 lightgbm loadsave need info regression

Source

fwaris

Most helpful comment

@fwaris - I just ran into this issue as well. I don't understand how your workaround works. What T is getting passed into MLHelper<T>?

@codemzs - this is the same issue as we were discussing today. I don't think it is possible to use PermutationFeatureImportance once a model is saved to disk.

This is an issue because if you use AutoML, it always saves the model to disk in order to save on memory.

The problem is this code:

https://github.com/dotnet/machinelearning/blob/bb00e07b30e9626b3578ff1934b86dad0d1d1ce9/src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs#L595-L601

Whenever you load a predition transformer from a model stream, it is always creating an instance of a new BinaryPredictionTransformer<IPredictorProducing<float>>. This object cannot be cast to an ISingleFeaturePredictionTransformer<TModel> that is necessary for calling PermutationFeatureImportance because the T in this case (IPredictorProducing<float>) is internal.

We need to change the above code to save off the right type into the model, and create an instance of BinaryPredictionTransformer<TModel>, where TModel is the type that was originally used when training the pipeline before saving to disk - for example, BinaryPredictionTransformer<CalibratedModelParametersBase<LightGbmBinaryModelParameters, PlattCalibrator>> when using LightGbm.

/cc @Dmitry-A @justinormont

eerhardt on 3 Aug 2019

👍2

All 14 comments

work around for now is to use a C# helper given below but really if an interface (IPredictorProducing) is going to be exposed via another public interface, it should not really be marked internal.

```C#
public static class MLHelper where T : class
{
public static System.Collections.Immutable.ImmutableArray PFI_BinaryClassification

         (
            MLContext ctx,
            ITransformer model,
            IDataView data,
            string labelColumnName = "Label",
            bool useFeatureWeightFilter = false,
            int? numberOfExamplesToUse = null,
            int permutationCount = 1
        )
    {

        var m = ctx.BinaryClassification.PermutationFeatureImportance(
                model as ISingleFeaturePredictionTransformer<T>, 
                data, 
                labelColumnName : labelColumnName, 
                useFeatureWeightFilter : useFeatureWeightFilter, 
                numberOfExamplesToUse : numberOfExamplesToUse, 
                permutationCount : permutationCount
                );
        return m;
    }

}

```

fwaris on 12 Jul 2019

@fwaris - I just ran into this issue as well. I don't understand how your workaround works. What T is getting passed into MLHelper<T>?

@codemzs - this is the same issue as we were discussing today. I don't think it is possible to use PermutationFeatureImportance once a model is saved to disk.

This is an issue because if you use AutoML, it always saves the model to disk in order to save on memory.

The problem is this code:

https://github.com/dotnet/machinelearning/blob/bb00e07b30e9626b3578ff1934b86dad0d1d1ce9/src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs#L595-L601

/cc @Dmitry-A @justinormont

eerhardt on 3 Aug 2019

👍2

@eerhardt, it seems you can punt on the type resolution in F# by using an underscore; i.e. the following trick seems to work (I tested again just to make sure):
F# let metrics = MLHelper<_>.PFI_BinaryClassification(mlctx, model, labelColumnName="Label")
The 'model' variable is of the concrete type (from debugger):
Microsoft.ML.Data.BinaryPredictionTransformer>

However I agree with you that this area requires re-work to make it easier to use.

fwaris on 3 Aug 2019

Hi. So PRs #4262 and #4306 fixed the problem Eric pointed out in his comment in this thread.

So please, let us know if this has been fixed for you. Particularly, those PRs where only tested for ML.NET on C#, so I would appreciate feedback from the F# side. I will rename and tag this issue as F# specific then, since that was your original problem.

antoniovs1029 on 20 Dec 2019

This is not fixed yet.
There are 2 ways to save the model.

1. As a pipeline + estimator
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/save-load-machine-learning-models-ml-net

var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "X").Concatenate("Features", "X1", "X2");
var estimator = context.MulticlassClassification.Trainers.LightGbm();
var model = pipeline.Append(estimator).Fit(dataView);
context.Model.Save(model, dataView.Schema, "C:/model.zip");

2. As an estimator, without pipeline
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#train-the-model

var estimator = context.MulticlassClassification.Trainers.LightGbm();
var transformedData = pipeline.Fit(dataView).Transform(dataView);
var model = estimator.Fit(transformedData);
context.Model.Save(model, dataView.Schema, "C:/model.zip");

Then loading from the disk.

var model = context.Model.Load("C:/model.zip", out var schema);
var engine = context.Model.CreatePredictionEngine<InputModel, OutputModel>(model);

1. As a pipeline + estimator - model contains only pipeline transformers, including MapValueToKey and Concatenate, there is no way to get actual trainer / estimator and use it for PFI. LastTransformer property will return Concatenate transformer, but PFI requires an estimator, e.g. LighGbm or Regression

2. As an estimator without pipeline - now I see LightGbm trainer in the list of TransformationChain, but CreatePredictionEngine raises an exception "Features" column is not defined, because in this case model was saved as a pure estimator, without pipeline

artemiusgreat on 26 Dec 2019

Correction.
It's possible to extract trainer from model, although the code is not that great.
LastTransformer still returns pipeline transformer instead of actual trainer.
ML.NET 1.5.0-preview in Nuget.

var model = context.Model.Load("C:/model.zip", out var schema);
var trainer = (model as IEnumerable<ITransformer>)
        .SelectMany(o => (o as IEnumerable<ITransformer>)
        .OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>())
        .FirstOrDefault();
var importance = context
        .MulticlassClassification
        .PermutationFeatureImportance(trainer, pipeline.Fit(dataView).Transform(dataView), permutationCount: 3);

artemiusgreat on 27 Dec 2019

👍1

How are you finding the 1.5.0-preview nuget? We literally just released today.

codemzs on 27 Dec 2019

@codemzs "Show pre-release" checkbox in Nuget package manager

artemiusgreat on 27 Dec 2019

ha! of course, I meant how has your experience been with it so far? does it fix any of your issues?

codemzs on 27 Dec 2019

The only thing I needed is to run PFI using model loaded from file. As far as it works, I'm happy

artemiusgreat on 27 Dec 2019

❤1

Hi, @artemiusgreat . So I am not sure: is your problem solved or not?

I believe it should be possible to access the lastTransformer directly from the model you saved to disk on the "1. As a pipeline + estimator" point by simply using:

var predictor = (lodedModel as TransformerChain<ITransformer>).LastTransformer as MulticlassPredictionTransformer<OneVersusAllModelParameters>;

I am not sure why would you need to use the.SelectMany(...) method you mentioned.

but PFI requires an estimator, e.g. LightGbm or Regression

PFI doesn't require an estimator, but a Prediction Transformer. So, in your example, the LightGbm trainer is also an estimator, and once it is trained (with .Fit()) it returns a Prediction Transformer of type MulticlassPredictionTransformer<OneVersusAllModelParameters>. You should pass this last transformer to PFI, and not the trainer or estimator:

pfi = ML.MulticlassClassification.PermutationFeatureImportance(predictor, data);

If you are still facing problems, please share with us the complete code and dataset you're using, so that I can take a closer look. Thanks.

antoniovs1029 on 31 Dec 2019

@antoniovs1029 Sorry, missed your comment. Yest it was fixed. Thanks.

artemiusgreat on 13 Mar 2020

So I've just tested the original scenario of this issue, on F#, and now it works... so it was indeed fixed by PRs #4262 and #4306 .

antoniovs1029 on 4 Jun 2020

Also, confirming that it works.

See this issue comment for some tricks that help when working with AutoML outputs
https://github.com/dotnet/docs/issues/19006#issuecomment-647075298

Note: The fix works in a compiled F# project but not in F# interactive (fsi) because the current fsi is bound to older libraries. I expect that it will work in the new preview version of fsi but I have not tested that yet.

fwaris on 21 Jun 2020

👍1

Was this page helpful?

0 / 5 - 0 ratings