Machinelearning: FastForestBinaryClassifier always return same prediction

Created on 27 Jul 2018  路  5Comments  路  Source: dotnet/machinelearning

System information

  • OS version/distro: Windows 10 17134.165
  • .NET Version (eg., dotnet --info): 4.7.2

Issue

  • What did you do?
    I tried to use FastForestBinaryClassifier for a learning application and a bool parameter as label.
  • What happened?
    Predicted Label always returns false despite most of my learning data is true. 5000/7000 result of train data true and data quantity is increasing while working. New results are real answers of previous predictions. Prediction always same as what it was at first the time for all kind of prediction input.

Source code / logs

class MLData
    {
        public class IrisPrediction
        {
            [ColumnName("PredictedLabel")]
            public bool PredictedLabels;
        }

        public class IrisData
        {
            [Column("0", name: "Label")] public bool Label;
            [Column("1")] [VectorType(1000)] public float[] param1;
            [Column("2")] [VectorType(1000)] public float[] param2;
            [Column("3")] [VectorType(1000)] public float[] param3;
            [Column("4")] [VectorType(1000)] public float[] param4;
            [Column("5")] [VectorType(1000)] public float[] param5;
            [Column("6")] [VectorType(1000)] public float[] param6;
        }

        public static List<IrisData> History = new List<IrisData>() { };
    }
class MLCore
    {
        private static string AppPath => Path.GetDirectoryName(Environment.GetCommandLineArgs()[0]);
        private static string ModelPath => Path.Combine(AppPath, "IrisModel.zip");
        private static PredictionModel<MLData.IrisData, MLData.IrisPrediction> readyModel;

        internal static async Task<PredictionModel<MLData.IrisData, MLData.IrisPrediction>> TrainAsync()
        {
            var data = MLData.History;
            var collection = CollectionDataSource.Create(data);

            var pipeline = new LearningPipeline()
            {
                collection,
                new ColumnConcatenator("Features", "param1","param2", "param3","param4", "param5", "param6"),

                new FastForestBinaryClassifier(),

                new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" }
            };

            PredictionModel<MLData.IrisData, MLData.IrisPrediction> model;

            try
            {
                model = pipeline.Train<MLData.IrisData, MLData.IrisPrediction>();
                await model.WriteAsync(ModelPath);
                PGlobals.learnSuccesfull = true;
            }
            catch (Exception e)
            {
                model = null;
                PGlobals.learnSuccesfull = false;
            }

            return model;
        }

        public static async void Learn()
        {
            readyModel = await TrainAsync();
        }

        public static void Think()
        {
            if (readyModel != null)
            {
                try
                {
                    var prediction = readyModel.Predict(new MLData.IrisData()
                    {
                        param1 = PGlobals.param1,
                        param2 = PGlobals.param2,
                        param3 = PGlobals.param3,
                        param4 = PGlobals.param4,
                        param5 = PGlobals.param5,
                        param6 = PGlobals.param6
                    });

                    PGlobals.predictedResult = prediction.PredictedLabels;
                }

                catch
                {
                    //Nothing
                }
            }
        }
    }

All 5 comments

Do you actually have 6000 features?
If yes, your [Column] should look like this:
c# [Column("1-1000")] [VectorType(1000)] public float[] param1; [Column("1001-2000")] [VectorType(1000)] public float[] param2; // etc.

Have the same issue (ML.NET v0.4). The classifier returns the same prediction (false).

Regarding the post by @bzn7 I consider it makes no difference how many features do a training set have - the answers must be different.

Well, if you have 6000 features, but you read them the way @bzn7 does (994 features appear 6 times each), the learner is going to be severely hampered. My guess was that the model that was learned was trivial, and therefore gave the same prediction all the time.

I think you are incorrect about this one:

I consider it makes no difference how many features do a training set have - the answers must be different.

I would say that if the answers are 'the same all the time', it is unfortunate, but far from uncommon. Here are some factors that can potentially cause this:

  • Features have no predictive signal in them. In this case the model will learn the priors and output them all the time.
  • Heavy overfitting on the train set. In this case the testing example will not belong to the area the model has 'studied', and the performance will be arbitrary.
  • Heavy class imbalance. Especially in multiclass problems, if the classes are heavily imbalanced, the model will predict the majority class in 'far too many' cases.

DRI RESPONSE: I'm considering this question as answered and intent to close issue within next few days, unless someone have objection.

Do you actually have 6000 features?
If yes, your [Column] should look like this:

            [Column("1-1000")] [VectorType(1000)] public float[] param1;
            [Column("1001-2000")] [VectorType(1000)] public float[] param2;
            // etc.

I tried to simplify my features and redefined columns as shown. It is working, thank you @Zruty0.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

aslotte picture aslotte  路  3Comments

maxt3r picture maxt3r  路  3Comments

sfilipi picture sfilipi  路  4Comments

sethreidnz picture sethreidnz  路  3Comments

frankhaugen picture frankhaugen  路  3Comments