I was trying to create a PredictEngine using a saved model. I found out that if I directly use the ITransformer retrieve from Pipeline.Fit, the CreatePredictionEngine works well. But after I save/reload it, then it will give the following error

The code for the pipeline is like this
public static IEstimator<ITransformer> BuildTrainingPipeline(MLContext mlContext)
{
// Data process configuration with pipeline data transformations
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")
.Append(mlContext.Transforms.LoadImages("ImagePath_featurized", @"C:\Users\xiaoyuz\Desktop\machinelearning-samples\datasets\images", "ImagePath"))
.Append(mlContext.Transforms.ResizeImages("ImagePath_featurized", 224, 224, "ImagePath_featurized"))
.Append(mlContext.Transforms.ExtractPixels("ImagePath_featurized", "ImagePath_featurized"))
.Append(mlContext.Transforms.DnnFeaturizeImage("ImagePath_featurized", m => m.ModelSelector.ResNet18(mlContext, m.OutputColumn, m.InputColumn), "ImagePath_featurized"))
.Append(mlContext.Transforms.Concatenate("Features", new[] { "ImagePath_featurized" }))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
// Set the training algorithm
var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "Label", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "Label")
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
var trainingPipeline = dataProcessPipeline.Append(trainer);
return trainingPipeline;
}
And ModelInput and ModelOutput class is like this
public class ModelInput
{
[ColumnName("Label"), LoadColumn(0)]
public string Label { get; set; }
[ColumnName("Title"), LoadColumn(1)]
public string Title { get; set; }
[ColumnName("Url"), LoadColumn(2)]
public string Url { get; set; }
[ColumnName("ImagePath"), LoadColumn(3)]
public string ImagePath { get; set; }
}
public class ModelOutput
{
// ColumnName attribute is used to change the column name from
// its default value, which is the name of the field.
[ColumnName("PredictedLabel")]
public String Prediction { get; set; }
public float[] Score { get; set; }
}
It's really wield though. And my description may not be that detailed. If you need further information, please let me know
@codemzs @justinormont - any thoughts?
My top guess is a serialization/deserialization bug in DnnFeaturizeImage(). We may not have a unit test to cover its serialization/deserialization.
I may be wrong that it's an issue currently, but it was known to be working before we converted from a CNTK-model, to backing it with an ONNX-model.
I was able to reproduce this error. It appears to me the issue is actually in the NormalizingTransformer code.
This code is saving the type into the model, and then re-hydrating it from the model.
From what I can tell, it is saving the Vector's .Size, which is not enough information. In the scenario above, the Vector has multiple Dimensions <1, 512, 1, 1>. This data is getting lost when saving the model and reading it back.
We should save the dimension information here as well. And look if anywhere else is saving VectorDataView types into the model, and reloading it to ensure they don't have the same bug.
I figured out why this works around the problem. The reason is because of the Concatenate step inbetween the DnnFeaturizeImage and the NormalizeMinMax:
.Append(mlContext.Transforms.ExtractPixels("ImageSource_featurized", "ImageSource_featurized"))
.Append(mlContext.Transforms.DnnFeaturizeImage("ImageSource_featurized", m => m.ModelSelector.ResNet18(mlContext, m.OutputColumn, m.InputColumn), "ImageSource_featurized"))
--> .Append(mlContext.Transforms.Concatenate("Features", new[] { "ImageSource_featurized", "Title", "URL" }))
.Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
.AppendCacheCheckpoint(mlContext);
What this does is collapses the multi-dimension vector from DnnFeaturizeImage into a single dimension vector, which the Normalize code can handle correctly.
| [11] | {Input247: Vector<Single, 1, 3, 224, 224>} | Microsoft.ML.DataViewSchema.Column
â–¶ | [12] | {Pooling395_Output_0: Vector<Single, 1, 512, 1, 1>} | Microsoft.ML.DataViewSchema.Column
â–¶ | [13] | {Features: Vector<Single, 514> {SlotNames}} | Microsoft.ML.DataViewSchema.Column
| [14] | {Features: Vector<Single, 514> {IsNormalized, SlotNames}} | Microsoft.ML.DataViewSchema.Column
| [15] | {PredictedLabel: Key<UInt32, 0-3> {ScoreColumnKind, ScoreValueKind, KeyValues, ScoreColumnSetId}} | Microsoft.ML.DataViewSchema.Column
See the two columns with arrows. The first is the output of the DnnFeaturizeImage. The second is the output of the Concatenate which adds the two "dummy" columns. The vector goes from 1 x 512 x 1 x1, to just 514.
@beccamc, @eerhardt: Great sleuthing in finding the error, a work around, and why it works.
I see two issues:
Vector<Single, 1, 512, 1, 1>, it should be Vector<Single, 512> (internal version returns a 1D vector)For AutoML, an easy solution (for today) is to ensure a concatenate is always added; exploiting @beccamc's workaround to flatten the vector. Currently, the concat is added only if multiple featurizers are used or if multiple columns exist, as normally there is no gain in running concat on a single column which is already a vector.
I do rather like uniformly producing a concatenation in the pipeline, even when unneeded. This adds a pedagogical aspect, which encourages users to explore additional custom feature engineering. The concatenate is needed should the user want to have additional featurization, hence always producing one gives users a place to send their new output into the final feature vector.
I see two issues:
- Serialization for NormalizingTransformer not saving the multiple dimensions of its input
- DnnFeaturizeImage for Resnet18 returns Vector
, it should be Vector (internal version returns a 1D vector)
I wonder: if issue 2) were to be fixed, wouldn't that be enough to fix this specific problem? Because then the normalizing transformer wouldn't have to work with multidimensional data, right?
I ask because I don't know if there are other cases where the Normalizing Transformer is expected to work with multidimensional data.
I ask because I don't know if there are other cases where the Normalizing Transformer is expected to work with multidimensional data.
Why wouldn't it be able to work with multidimensional data? It works just fine if you don't save the model to a .zip file.
VBuffers have support for multi-dimensional data. If the Normalizing Transformer can't/shouldn't work with multi-dimensional data, then it should throw an exception that it can't. But it doesn't throw because it can.
If you fixed issue (2) above, yes it would fix this specific scenario - but any other scenario where multi-dimensional data was used in a Normalizer, it would fail again. TensorFlow and Onnx (any another other neural network transformer) can return multi-dimensional data today. So can KeyToVectorMapping. Just search the code for places that create a VectorDataViewType with multi-dimensions and you will see where these can be created today. And I'm sure more will be created in the future.
@eerhardt is, as always, spot on.
Fixing either of the two issues will solve this specific scenario. In the end both issues (1) & (2) should be fixed, which fixes an entire class of yet unseen issues. Aka, fix things twice: once for the specific customer, and once so no other customer runs into a similar issue.
For the record, issue # 1 mentioned in here got fixed on #4321 some time ago: https://github.com/dotnet/machinelearning/issues/4226#issuecomment-533772062:
- Serialization for NormalizingTransformer not saving the multiple dimensions of its input
- DnnFeaturizeImage for Resnet18 returns Vector
, it should be Vector (internal version returns a 1D vector)
Issue # 2 remains unfixed, and it was decided offline some time ago to not address this immediately, as the problem described here got unblocked by simply fixing # 1. Still, Issue # 2 remains an issue, and it should be fixed or addressed at some point.
close this issue as the break is fixed, opened https://github.com/dotnet/machinelearning/issues/5254 to track the Issue #2 separately.
Most helpful comment
I figured out why this works around the problem. The reason is because of the
Concatenatestep inbetween theDnnFeaturizeImageand theNormalizeMinMax:What this does is collapses the multi-dimension vector from
DnnFeaturizeImageinto a single dimension vector, which the Normalize code can handle correctly.See the two columns with arrows. The first is the output of the
DnnFeaturizeImage. The second is the output of theConcatenatewhich adds the two "dummy" columns. The vector goes from 1 x 512 x 1 x1, to just 514.