Machinelearning: expected Expected float or float vector of known size, got Vec<R4>

Created on 9 Apr 2019  路  3Comments  路  Source: dotnet/machinelearning

System information

 dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   3.0.100-preview3-010431
 Commit:    d72abce213

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.14
 OS Platform: Darwin
 RID:         osx.10.14-x64
 Base Path:   /usr/local/share/dotnet/sdk/3.0.100-preview3-010431/

Host (useful for support):
  Version: 3.0.0-preview5-27606-03
  Commit:  39eb528ff8

Issue

  • What did you do?
    I've implemented the project by following the GitHubLabeler sample.
  • What happened?
    When I run the code, I'm getting the following error:
Unhandled Exception: System.ArgumentOutOfRangeException: Schema mismatch for input column 'ArtistFeaturized_CharExtractor': expected Expected float or float vector of known size, got Vec<R4>
Parameter name: inputSchema
   at Microsoft.ML.Transforms.LpNormalizingTransformer.CheckInputColumn(DataViewSchema inputSchema, Int32 col, Int32 srcCol)
   at Microsoft.ML.Data.OneToOneTransformerBase.OneToOneMapperBase..ctor(IHost host, OneToOneTransformerBase parent, DataViewSchema inputSchema)
   at Microsoft.ML.Transforms.LpNormalizingTransformer.Mapper..ctor(LpNormalizingTransformer parent, DataViewSchema inputSchema)
   at Microsoft.ML.Transforms.LpNormalizingTransformer.MakeRowMapper(DataViewSchema schema)
   at Microsoft.ML.Data.RowToRowTransformerBase.MakeDataTransform(IDataView input)
   at Microsoft.ML.Transforms.Text.TextFeaturizingEstimator.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.Data.EstimatorChain`1.Fit(IDataView input)
   at Microsoft.ML.TrainCatalogBase.<>c__DisplayClass7_0.<CrossValidateTrain>b__0(Int32 fold)
   at Microsoft.ML.TrainCatalogBase.CrossValidateTrain(IDataView data, IEstimator`1 estimator, Int32 numFolds, String samplingKeyColumn, Nullable`1 seed)
   at Microsoft.ML.MulticlassClassificationCatalog.CrossValidate(IDataView data, IEstimator`1 estimator, Int32 numFolds, String labelColumn, String samplingKeyColumn, Nullable`1 seed)
   at Program.buildAndTrainTheModel(String dataSetLocation, String modelPath) in /Users/samuele.resca/Projects/LyricsClassifier/LyricsClassifier/Program.fs:line 72
   at Program.main(String[] _argv) in /Users/samuele.resca/Projects/LyricsClassifier/LyricsClassifier/Program.fs:line 130
  • What did you expect?
    It seems that the training model expects only float or float vector of known size.
    Why in the case of the GitHubLabeler
    which almost has the same data schema, doesn't perform any data transformation?

Source code / logs

LyricsClassifier

Most helpful comment

Pretty sure the issue is with mlContext.Transforms.Text.FeaturizeText, got the same error in similar circumstance on C#, Windows, .NET Framework, so literally very little in common.

EDIT: In my case, the issue was that I wasn't properly loading data into the IDataView and it was an empty vector. Try debugging your case with:
var messageTexts = dataView.GetColumn<string>("Input").Take(20).ToArray();
or
var prev = dataview.Preview()

Or whatever analogue you have in F#

All 3 comments

Pretty sure the issue is with mlContext.Transforms.Text.FeaturizeText, got the same error in similar circumstance on C#, Windows, .NET Framework, so literally very little in common.

EDIT: In my case, the issue was that I wasn't properly loading data into the IDataView and it was an empty vector. Try debugging your case with:
var messageTexts = dataView.GetColumn<string>("Input").Take(20).ToArray();
or
var prev = dataview.Preview()

Or whatever analogue you have in F#

I think the data might not be loaded correctly. Take a preview of your training data after you load it.
You have specified tab as the separator, but looking at your raw data the columns seem separated by 4 spaces.

Thanks for the help guys, very appreciated 馃槃

Was this page helpful?
0 / 5 - 0 ratings