Machinelearning: `predictionEngine` breaks after saving/loading a Model

Created on 18 Sep 2019 · 12Comments · Source: dotnet/machinelearning

System information

win 10:
1.3.1:

I was trying to create a PredictEngine using a saved model. I found out that if I directly use the ITransformer retrieve from Pipeline.Fit, the CreatePredictionEngine works well. But after I save/reload it, then it will give the following error

The code for the pipeline is like this

public static IEstimator<ITransformer> BuildTrainingPipeline(MLContext mlContext)
        {
            // Data process configuration with pipeline data transformations 
            var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")
                                      .Append(mlContext.Transforms.LoadImages("ImagePath_featurized", @"C:\Users\xiaoyuz\Desktop\machinelearning-samples\datasets\images", "ImagePath"))
                                      .Append(mlContext.Transforms.ResizeImages("ImagePath_featurized", 224, 224, "ImagePath_featurized"))
                                      .Append(mlContext.Transforms.ExtractPixels("ImagePath_featurized", "ImagePath_featurized"))
                                      .Append(mlContext.Transforms.DnnFeaturizeImage("ImagePath_featurized", m => m.ModelSelector.ResNet18(mlContext, m.OutputColumn, m.InputColumn), "ImagePath_featurized"))
                                      .Append(mlContext.Transforms.Concatenate("Features", new[] { "ImagePath_featurized" }))
                                      .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
                                      .AppendCacheCheckpoint(mlContext);
            // Set the training algorithm 
            var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "Label", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "Label")
                                      .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
            var trainingPipeline = dataProcessPipeline.Append(trainer);

            return trainingPipeline;
        }

And ModelInput and ModelOutput class is like this

    public class ModelInput
    {
        [ColumnName("Label"), LoadColumn(0)]
        public string Label { get; set; }


        [ColumnName("Title"), LoadColumn(1)]
        public string Title { get; set; }


        [ColumnName("Url"), LoadColumn(2)]
        public string Url { get; set; }


        [ColumnName("ImagePath"), LoadColumn(3)]
        public string ImagePath { get; set; }


    }

public class ModelOutput
    {
        // ColumnName attribute is used to change the column name from
        // its default value, which is the name of the field.
        [ColumnName("PredictedLabel")]
        public String Prediction { get; set; }
        public float[] Score { get; set; }
    }

It's really wield though. And my description may not be that detailed. If you need further information, please let me know

P1 bug image loadsave onnx

Source

LittleLittleCloud

Most helpful comment

I figured out why this works around the problem. The reason is because of the Concatenate step inbetween the DnnFeaturizeImage and the NormalizeMinMax:

    .Append(mlContext.Transforms.ExtractPixels("ImageSource_featurized", "ImageSource_featurized"))
    .Append(mlContext.Transforms.DnnFeaturizeImage("ImageSource_featurized", m => m.ModelSelector.ResNet18(mlContext, m.OutputColumn, m.InputColumn), "ImageSource_featurized"))
--> .Append(mlContext.Transforms.Concatenate("Features", new[] { "ImageSource_featurized", "Title", "URL" }))
    .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
    .AppendCacheCheckpoint(mlContext);

What this does is collapses the multi-dimension vector from DnnFeaturizeImage into a single dimension vector, which the Normalize code can handle correctly.

 | [11] | {Input247: Vector<Single, 1, 3, 224, 224>} | Microsoft.ML.DataViewSchema.Column
▶ | [12] | {Pooling395_Output_0: Vector<Single, 1, 512, 1, 1>} | Microsoft.ML.DataViewSchema.Column
▶ | [13] | {Features: Vector<Single, 514> {SlotNames}} | Microsoft.ML.DataViewSchema.Column
 | [14] | {Features: Vector<Single, 514> {IsNormalized, SlotNames}} | Microsoft.ML.DataViewSchema.Column
 | [15] | {PredictedLabel: Key<UInt32, 0-3> {ScoreColumnKind, ScoreValueKind, KeyValues, ScoreColumnSetId}} | Microsoft.ML.DataViewSchema.Column

See the two columns with arrows. The first is the output of the DnnFeaturizeImage. The second is the output of the Concatenate which adds the two "dummy" columns. The vector goes from 1 x 512 x 1 x1, to just 514.

eerhardt on 20 Sep 2019

❤1 👍1

All 12 comments

@codemzs @justinormont - any thoughts?

eerhardt on 18 Sep 2019

A reproducible project can be found here

And a tiny dataset for test can be found here

(you might need to update dataset path in project)

LittleLittleCloud on 19 Sep 2019

My top guess is a serialization/deserialization bug in DnnFeaturizeImage(). We may not have a unit test to cover its serialization/deserialization.

I may be wrong that it's an issue currently, but it was known to be working before we converted from a CNTK-model, to backing it with an ONNX-model.

justinormont on 19 Sep 2019

I was able to reproduce this error. It appears to me the issue is actually in the NormalizingTransformer code.

https://github.com/dotnet/machinelearning/blob/862ae842104ab25ea8f628438339fe0dce251866/src/Microsoft.ML.Data/Transforms/Normalizer.cs#L385-L422

This code is saving the type into the model, and then re-hydrating it from the model.

From what I can tell, it is saving the Vector's .Size, which is not enough information. In the scenario above, the Vector has multiple Dimensions <1, 512, 1, 1>. This data is getting lost when saving the model and reading it back.

We should save the dimension information here as well. And look if anywhere else is saving VectorDataView types into the model, and reloading it to ensure they don't have the same bug.

eerhardt on 19 Sep 2019

👍1

@eerhardt I heard you wanted to see our workaround. I've zipped a working project here. Sample data used here. I included the tsv we created in the same data folder zip (Animals.zip). Let me know if something doesn't work!

beccamc on 19 Sep 2019

❤1 👍1

I figured out why this works around the problem. The reason is because of the Concatenate step inbetween the DnnFeaturizeImage and the NormalizeMinMax:

    .Append(mlContext.Transforms.ExtractPixels("ImageSource_featurized", "ImageSource_featurized"))
    .Append(mlContext.Transforms.DnnFeaturizeImage("ImageSource_featurized", m => m.ModelSelector.ResNet18(mlContext, m.OutputColumn, m.InputColumn), "ImageSource_featurized"))
--> .Append(mlContext.Transforms.Concatenate("Features", new[] { "ImageSource_featurized", "Title", "URL" }))
    .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
    .AppendCacheCheckpoint(mlContext);

What this does is collapses the multi-dimension vector from DnnFeaturizeImage into a single dimension vector, which the Normalize code can handle correctly.

 | [11] | {Input247: Vector<Single, 1, 3, 224, 224>} | Microsoft.ML.DataViewSchema.Column
▶ | [12] | {Pooling395_Output_0: Vector<Single, 1, 512, 1, 1>} | Microsoft.ML.DataViewSchema.Column
▶ | [13] | {Features: Vector<Single, 514> {SlotNames}} | Microsoft.ML.DataViewSchema.Column
 | [14] | {Features: Vector<Single, 514> {IsNormalized, SlotNames}} | Microsoft.ML.DataViewSchema.Column
 | [15] | {PredictedLabel: Key<UInt32, 0-3> {ScoreColumnKind, ScoreValueKind, KeyValues, ScoreColumnSetId}} | Microsoft.ML.DataViewSchema.Column

eerhardt on 20 Sep 2019

❤1 👍1

@beccamc, @eerhardt: Great sleuthing in finding the error, a work around, and why it works.

I see two issues:

Serialization for NormalizingTransformer not saving the multiple dimensions of its input
DnnFeaturizeImage for Resnet18 returns Vector<Single, 1, 512, 1, 1>, it should be Vector<Single, 512> (internal version returns a 1D vector)

For AutoML, an easy solution (for today) is to ensure a concatenate is always added; exploiting @beccamc's workaround to flatten the vector. Currently, the concat is added only if multiple featurizers are used or if multiple columns exist, as normally there is no gain in running concat on a single column which is already a vector.

I do rather like uniformly producing a concatenation in the pipeline, even when unneeded. This adds a pedagogical aspect, which encourages users to explore additional custom feature engineering. The concatenate is needed should the user want to have additional featurization, hence always producing one gives users a place to send their new output into the final feature vector.

justinormont on 21 Sep 2019

I see two issues:

Serialization for NormalizingTransformer not saving the multiple dimensions of its input

DnnFeaturizeImage for Resnet18 returns Vector, it should be Vector (internal version returns a 1D vector)

I wonder: if issue 2) were to be fixed, wouldn't that be enough to fix this specific problem? Because then the normalizing transformer wouldn't have to work with multidimensional data, right?

I ask because I don't know if there are other cases where the Normalizing Transformer is expected to work with multidimensional data.

antoniovs1029 on 4 Oct 2019

I ask because I don't know if there are other cases where the Normalizing Transformer is expected to work with multidimensional data.

Why wouldn't it be able to work with multidimensional data? It works just fine if you don't save the model to a .zip file.

VBuffers have support for multi-dimensional data. If the Normalizing Transformer can't/shouldn't work with multi-dimensional data, then it should throw an exception that it can't. But it doesn't throw because it can.

If you fixed issue (2) above, yes it would fix this specific scenario - but any other scenario where multi-dimensional data was used in a Normalizer, it would fail again. TensorFlow and Onnx (any another other neural network transformer) can return multi-dimensional data today. So can KeyToVectorMapping. Just search the code for places that create a VectorDataViewType with multi-dimensions and you will see where these can be created today. And I'm sure more will be created in the future.

eerhardt on 4 Oct 2019

👍1

@eerhardt is, as always, spot on.

Fixing either of the two issues will solve this specific scenario. In the end both issues (1) & (2) should be fixed, which fixes an entire class of yet unseen issues. Aka, fix things twice: once for the specific customer, and once so no other customer runs into a similar issue.

justinormont on 4 Oct 2019

👍1

For the record, issue # 1 mentioned in here got fixed on #4321 some time ago: https://github.com/dotnet/machinelearning/issues/4226#issuecomment-533772062:

Serialization for NormalizingTransformer not saving the multiple dimensions of its input

DnnFeaturizeImage for Resnet18 returns Vector, it should be Vector (internal version returns a 1D vector)

Issue # 2 remains unfixed, and it was decided offline some time ago to not address this immediately, as the problem described here got unblocked by simply fixing # 1. Still, Issue # 2 remains an issue, and it should be fixed or addressed at some point.

antoniovs1029 on 3 Jun 2020

close this issue as the break is fixed, opened https://github.com/dotnet/machinelearning/issues/5254 to track the Issue #2 separately.