Machinelearning: Error message 'Could not find feature column' in Fit() method

Created on 27 May 2019  路  5Comments  路  Source: dotnet/machinelearning

System information

  • Windows7x64:
  • .NET Version 2.1.502:

Issue

  • I try to create my first sample of Microsoft.ML and my program fails with message 'Could not find feature column 'X'', but type contains the field.

image

Source code / logs

```c#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

namespace ML.NET
{
public class FormulaData{
[ColumnName("Label")]
public float Y;
[ColumnName("Features")]
public float X;
public FormulaData(double x, double y){
X = Convert.ToSingle(x);
Y = Convert.ToSingle(y);
}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");

        List<FormulaData> pointsValues = Enumerable
            .Range(-1, 8)
            .Select(value => {return new FormulaData(value, value*2-1);})
            .ToList();

        // Create MLContext
        var mlContext = new MLContext(1);

        // Load Data
        IDataView data = mlContext.Data.LoadFromEnumerable<FormulaData>(pointsValues);

        DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
        IDataView trainData = dataSplit.TrainSet;
        IDataView testData = dataSplit.TestSet;

        // Define trainer options.
        var options = new SdcaRegressionTrainer.Options
        {
            LabelColumnName = nameof(FormulaData.Y),
            FeatureColumnName = nameof(FormulaData.X),
            // Make the convergence tolerance tighter. It effectively leads to more training iterations.
            ConvergenceTolerance = 0.02f,
            // Increase the maximum number of passes over training data. Similar to ConvergenceTolerance,
            // this value specifics the hard iteration limit on the training algorithm.
            MaximumNumberOfIterations = 30,
            // Increase learning rate for bias.
            BiasLearningRate = 0.1f            
        };

        // Define StochasticDualCoodrinateAscent regression algorithm estimator
        var sdcaEstimator = mlContext.Regression.Trainers.Sdca(options);

        // Build machine learning model
        var trainedModel = sdcaEstimator.Fit(trainData);

        // Use trained model to make inferences on test data
        IDataView testDataPredictions = trainedModel.Transform(testData);

        // Extract model metrics and get RSquared
        RegressionMetrics trainedModelMetrics = mlContext.Regression.Evaluate(testDataPredictions);
        double rSquared = trainedModelMetrics.RSquared;

        Console.WriteLine($"rSquared: {rSquared}");
    }
}

}

```

Most helpful comment

working sample added:
```c#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

/*
https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.standardtrainerscatalog.sdca?view=ml-dotnet
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/load-data-ml-net
*/

namespace ML.NET
{

public class FormulaData{
    [ColumnName("Features")]
    [VectorType(1)]
    public float[] X;
    [ColumnName("Label")]
    public float Y;
    public FormulaData(double x, double y){
        X = new float[]{Convert.ToSingle(x)};
        Y = Convert.ToSingle(y);
    }
}
class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello World 'Microsoft.ML'!");

        List<FormulaData> pointsValues = Enumerable
            .Range(-1, 100)
            .Select(value => {return new FormulaData(value, value*2-1);})
            .ToList();

        // Create MLContext
        var mlContext = new MLContext(1);

        // Load Data
        IDataView data = mlContext.Data.LoadFromEnumerable<FormulaData>(pointsValues);

        DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
        IDataView trainData = dataSplit.TrainSet;
        IDataView testData = dataSplit.TestSet;

        // Define trainer options.
        var options = new SdcaRegressionTrainer.Options
        {
            LabelColumnName = "Label", //nameof(FormulaData.Y),
            FeatureColumnName = "Features", //nameof(FormulaData.X),
            // Make the convergence tolerance tighter. It effectively leads to more training iterations.
            ConvergenceTolerance = 0.02f,
            // Increase the maximum number of passes over training data. Similar to ConvergenceTolerance,
            // this value specifics the hard iteration limit on the training algorithm.
            MaximumNumberOfIterations = 30,
            // Increase learning rate for bias.
            BiasLearningRate = 0.1f            
        };

        // Define StochasticDualCoodrinateAscent regression algorithm estimator
        var sdcaEstimator = mlContext.Regression.Trainers.Sdca(options);

        // Build machine learning model
        var trainedModel = sdcaEstimator.Fit(trainData);

        // Use trained model to make inferences on test data
        IDataView testDataPredictions = trainedModel.Transform(testData);

        // Extract model metrics and get RSquared
        RegressionMetrics trainedModelMetrics = mlContext.Regression.Evaluate(testDataPredictions);
        double rSquared = trainedModelMetrics.RSquared;

        Console.WriteLine($"rSquared: {rSquared}");
    }
}

}
```

All 5 comments

You're configuring the class with annotations, and then you name it as if you didn't perhaps use one or the other, use the name ColumnName("Features") or FeatureColumnName = nameof(FormulaData.X), just do not use both as the data source should have the data fields mapped in the pipeline.
Try

var options = new SdcaRegressionTrainer.Options
{
         LabelColumnName = "Label"
         FeatureColumnName = "Features",
          ConvergenceTolerance = 0.02f,
         MaximumNumberOfIterations = 30,
         BiasLearningRate = 0.1f            
};

It solved the issue.
But now, I have other error's message:

Exception has occurred: CLR/System.ArgumentOutOfRangeException
An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in Microsoft.ML.Data.dll: 'Schema mismatch for feature column 'Features': expected Vector<Single>, got Single'

or

for definition
```c#
public class FormulaData{
[ColumnName("Features")]
public float[] X;
[ColumnName("Label")]
public float Y;
public FormulaData(double x, double y){
X = new float[]{Convert.ToSingle(x)};
Y = Convert.ToSingle(y);
}
}


An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in Microsoft.ML.Data.dll: 'Schema mismatch for feature column 'Features': expected Vector, got VarVector'
```

found a solution with help of attribute [VectorType(1)]
c# public class FormulaData{ [ColumnName("Features")] [VectorType(1)] public float[] X; [ColumnName("Label")] public float Y; public FormulaData(double x, double y){ X = new float[]{Convert.ToSingle(x)}; Y = Convert.ToSingle(y); } }

working sample added:
```c#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Numerics;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Trainers;

/*
https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.standardtrainerscatalog.sdca?view=ml-dotnet
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/load-data-ml-net
*/

namespace ML.NET
{

public class FormulaData{
    [ColumnName("Features")]
    [VectorType(1)]
    public float[] X;
    [ColumnName("Label")]
    public float Y;
    public FormulaData(double x, double y){
        X = new float[]{Convert.ToSingle(x)};
        Y = Convert.ToSingle(y);
    }
}
class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello World 'Microsoft.ML'!");

        List<FormulaData> pointsValues = Enumerable
            .Range(-1, 100)
            .Select(value => {return new FormulaData(value, value*2-1);})
            .ToList();

        // Create MLContext
        var mlContext = new MLContext(1);

        // Load Data
        IDataView data = mlContext.Data.LoadFromEnumerable<FormulaData>(pointsValues);

        DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
        IDataView trainData = dataSplit.TrainSet;
        IDataView testData = dataSplit.TestSet;

        // Define trainer options.
        var options = new SdcaRegressionTrainer.Options
        {
            LabelColumnName = "Label", //nameof(FormulaData.Y),
            FeatureColumnName = "Features", //nameof(FormulaData.X),
            // Make the convergence tolerance tighter. It effectively leads to more training iterations.
            ConvergenceTolerance = 0.02f,
            // Increase the maximum number of passes over training data. Similar to ConvergenceTolerance,
            // this value specifics the hard iteration limit on the training algorithm.
            MaximumNumberOfIterations = 30,
            // Increase learning rate for bias.
            BiasLearningRate = 0.1f            
        };

        // Define StochasticDualCoodrinateAscent regression algorithm estimator
        var sdcaEstimator = mlContext.Regression.Trainers.Sdca(options);

        // Build machine learning model
        var trainedModel = sdcaEstimator.Fit(trainData);

        // Use trained model to make inferences on test data
        IDataView testDataPredictions = trainedModel.Transform(testData);

        // Extract model metrics and get RSquared
        RegressionMetrics trainedModelMetrics = mlContext.Regression.Evaluate(testDataPredictions);
        double rSquared = trainedModelMetrics.RSquared;

        Console.WriteLine($"rSquared: {rSquared}");
    }
}

}
```

[VectorType(1)] solves the problem of VarVector
This is the only place on the entire Internet that I can find the right answer for that problem

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bs6523 picture bs6523  路  4Comments

ddobric picture ddobric  路  4Comments

darren-zdc picture darren-zdc  路  3Comments

rebecca-burwei picture rebecca-burwei  路  3Comments

pgovind picture pgovind  路  3Comments