I am trying to use PermutationFeatureImportance (PFI) with F# but the F# type system is not resolving ITransformer to ISingleFeaturePredictionTransformer - which is required by PFI.
I believe it is due to IPredictorProducing (and related interfaces) being marked as "internal".
F# supports explicit interfaces and maybe that is the reason for this issue.
Here is a snippet of code that shows what I am trying to do
(I am using the latest bits - v 1.2.0 at the time of this post)
F#
let mutable schema = null
let mdl = ctx.Model.Load(@"F:\fwaris\data\t\analysis\model_cv_LightGbmBinary.bin", &schema)
let mdlt = mdl :?> TransformerChain<ITransformer>
let m1 = mdlt.LastTransformer //debugger shows it is Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.IPredictorProducing<float>>
let scored = mdl.Transform(trainView)
scored.Preview()
ctx.BinaryClassification.PermutationFeatureImportance(m1 :?> _,scored)
@dsyme
work around for now is to use a C# helper given below but really if an interface (IPredictorProducing) is going to be exposed via another public interface, it should not really be marked internal.
```C#
public static class MLHelper
{
public static System.Collections.Immutable.ImmutableArray
(
MLContext ctx,
ITransformer model,
IDataView data,
string labelColumnName = "Label",
bool useFeatureWeightFilter = false,
int? numberOfExamplesToUse = null,
int permutationCount = 1
)
{
var m = ctx.BinaryClassification.PermutationFeatureImportance(
model as ISingleFeaturePredictionTransformer<T>,
data,
labelColumnName : labelColumnName,
useFeatureWeightFilter : useFeatureWeightFilter,
numberOfExamplesToUse : numberOfExamplesToUse,
permutationCount : permutationCount
);
return m;
}
}
```
@fwaris - I just ran into this issue as well. I don't understand how your workaround works. What T is getting passed into MLHelper<T>?
@codemzs - this is the same issue as we were discussing today. I don't think it is possible to use PermutationFeatureImportance once a model is saved to disk.
This is an issue because if you use AutoML, it always saves the model to disk in order to save on memory.
The problem is this code:
Whenever you load a predition transformer from a model stream, it is always creating an instance of a new BinaryPredictionTransformer<IPredictorProducing<float>>. This object cannot be cast to an ISingleFeaturePredictionTransformer<TModel> that is necessary for calling PermutationFeatureImportance because the T in this case (IPredictorProducing<float>) is internal.
We need to change the above code to save off the right type into the model, and create an instance of BinaryPredictionTransformer<TModel>, where TModel is the type that was originally used when training the pipeline before saving to disk - for example, BinaryPredictionTransformer<CalibratedModelParametersBase<LightGbmBinaryModelParameters, PlattCalibrator>> when using LightGbm.
/cc @Dmitry-A @justinormont
@eerhardt, it seems you can punt on the type resolution in F# by using an underscore; i.e. the following trick seems to work (I tested again just to make sure):
F#
let metrics = MLHelper<_>.PFI_BinaryClassification(mlctx, model, labelColumnName="Label")
The 'model' variable is of the concrete type (from debugger):
Microsoft.ML.Data.BinaryPredictionTransformer
However I agree with you that this area requires re-work to make it easier to use.
Hi. So PRs #4262 and #4306 fixed the problem Eric pointed out in his comment in this thread.
So please, let us know if this has been fixed for you. Particularly, those PRs where only tested for ML.NET on C#, so I would appreciate feedback from the F# side. I will rename and tag this issue as F# specific then, since that was your original problem.
This is not fixed yet.
There are 2 ways to save the model.
1. As a pipeline + estimator
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/save-load-machine-learning-models-ml-net
var pipeline = context.Transforms.Conversion.MapValueToKey("Label", "X").Concatenate("Features", "X1", "X2");
var estimator = context.MulticlassClassification.Trainers.LightGbm();
var model = pipeline.Append(estimator).Fit(dataView);
context.Model.Save(model, dataView.Schema, "C:/model.zip");
2. As an estimator, without pipeline
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/explain-machine-learning-model-permutation-feature-importance-ml-net#train-the-model
var estimator = context.MulticlassClassification.Trainers.LightGbm();
var transformedData = pipeline.Fit(dataView).Transform(dataView);
var model = estimator.Fit(transformedData);
context.Model.Save(model, dataView.Schema, "C:/model.zip");
Then loading from the disk.
var model = context.Model.Load("C:/model.zip", out var schema);
var engine = context.Model.CreatePredictionEngine<InputModel, OutputModel>(model);
1. As a pipeline + estimator - model contains only pipeline transformers, including MapValueToKey and Concatenate, there is no way to get actual trainer / estimator and use it for PFI. LastTransformer property will return Concatenate transformer, but PFI requires an estimator, e.g. LighGbm or Regression
2. As an estimator without pipeline - now I see LightGbm trainer in the list of TransformationChain, but CreatePredictionEngine raises an exception "Features" column is not defined, because in this case model was saved as a pure estimator, without pipeline
Correction.
It's possible to extract trainer from model, although the code is not that great.
LastTransformer still returns pipeline transformer instead of actual trainer.
ML.NET 1.5.0-preview in Nuget.
var model = context.Model.Load("C:/model.zip", out var schema);
var trainer = (model as IEnumerable<ITransformer>)
.SelectMany(o => (o as IEnumerable<ITransformer>)
.OfType<MulticlassPredictionTransformer<OneVersusAllModelParameters>>())
.FirstOrDefault();
var importance = context
.MulticlassClassification
.PermutationFeatureImportance(trainer, pipeline.Fit(dataView).Transform(dataView), permutationCount: 3);
How are you finding the 1.5.0-preview nuget? We literally just released today.
@codemzs "Show pre-release" checkbox in Nuget package manager
ha! of course, I meant how has your experience been with it so far? does it fix any of your issues?
The only thing I needed is to run PFI using model loaded from file. As far as it works, I'm happy
Hi, @artemiusgreat . So I am not sure: is your problem solved or not?
I believe it should be possible to access the lastTransformer directly from the model you saved to disk on the "1. As a pipeline + estimator" point by simply using:
var predictor = (lodedModel as TransformerChain<ITransformer>).LastTransformer as MulticlassPredictionTransformer<OneVersusAllModelParameters>;
I am not sure why would you need to use the.SelectMany(...) method you mentioned.
but PFI requires an estimator, e.g. LightGbm or Regression
PFI doesn't require an estimator, but a Prediction Transformer. So, in your example, the LightGbm trainer is also an estimator, and once it is trained (with .Fit()) it returns a Prediction Transformer of type MulticlassPredictionTransformer<OneVersusAllModelParameters>. You should pass this last transformer to PFI, and not the trainer or estimator:
pfi = ML.MulticlassClassification.PermutationFeatureImportance(predictor, data);
If you are still facing problems, please share with us the complete code and dataset you're using, so that I can take a closer look. Thanks.
@antoniovs1029 Sorry, missed your comment. Yest it was fixed. Thanks.
So I've just tested the original scenario of this issue, on F#, and now it works... so it was indeed fixed by PRs #4262 and #4306 .
Also, confirming that it works.
See this issue comment for some tricks that help when working with AutoML outputs
https://github.com/dotnet/docs/issues/19006#issuecomment-647075298
Note: The fix works in a compiled F# project but not in F# interactive (fsi) because the current fsi is bound to older libraries. I expect that it will work in the new preview version of fsi but I have not tested that yet.
Most helpful comment
@fwaris - I just ran into this issue as well. I don't understand how your workaround works. What
Tis getting passed intoMLHelper<T>?@codemzs - this is the same issue as we were discussing today. I don't think it is possible to use
PermutationFeatureImportanceonce a model is saved to disk.This is an issue because if you use AutoML, it always saves the model to disk in order to save on memory.
The problem is this code:
https://github.com/dotnet/machinelearning/blob/bb00e07b30e9626b3578ff1934b86dad0d1d1ce9/src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs#L595-L601
Whenever you load a predition transformer from a model stream, it is always creating an instance of a
new BinaryPredictionTransformer<IPredictorProducing<float>>. This object cannot be cast to anISingleFeaturePredictionTransformer<TModel>that is necessary for callingPermutationFeatureImportancebecause theTin this case (IPredictorProducing<float>) is internal.We need to change the above code to save off the right type into the model, and create an instance of
BinaryPredictionTransformer<TModel>, whereTModelis the type that was originally used when training the pipeline before saving to disk - for example,BinaryPredictionTransformer<CalibratedModelParametersBase<LightGbmBinaryModelParameters, PlattCalibrator>>when using LightGbm./cc @Dmitry-A @justinormont