You can find the notebook here: https://github.com/dcostea/SmartFireAlarm/blob/master/SmartFireAlarm/Jupyter/sample.ipynb
I have extracted here the code:
#r "nuget:Microsoft.ML,1.5.2"
#r "nuget:Microsoft.ML.LightGBM,1.5.2"
using Microsoft.ML;
using Microsoft.ML.Trainers.LightGbm;
using Microsoft.ML.Data;
MLContext mlContext = new MLContext(seed: 123);
const string TRAIN_DATASET_PATH = "./sensors_data_train.csv";
IDataView trainingData = mlContext.Data.LoadFromTextFile<ModelInput>(
path: TRAIN_DATASET_PATH,
hasHeader: true,
separatorChar: ',');
const string TEST_DATASET_PATH = "./sensors_data_test.csv";
IDataView testingData = mlContext.Data.LoadFromTextFile<ModelInput>(
path: TEST_DATASET_PATH,
hasHeader: true,
separatorChar: ',');
var featureColumns = new string[] { "Temperature", "Luminosity", "Infrared", "Distance" };
var trainingPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label")
.Append(mlContext.Transforms.Concatenate("Features", featureColumns))
.Append(mlContext.MulticlassClassification.Trainers.SDCAMaximumEntropy("Label", "Features"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
var model = trainingPipeline.Fit(trainingData);
var predictions = model.Transform(testingData);
var metrics = mlContext.MulticlassClassification.Evaluate(predictions, "Label", "Score", "PredictedLabel");
SDCA still has a certain degree of non-determinism even after setting the seed due to things like multi-threading. You can improve it by setting NumberOfThreads to 1 in the Options
Hi @dcostea ,
Has the answer above on setting NumberOfThreads in SdcaMaximumEntropyMulticlassTrainer.Options solved your issue? If so, please feel free to close this issue. If not, please confirm whether or not a different issue is now occurring, or the same error is being outputted. Thanks!
Hi @dcostea ,
Has the answer above on setting
NumberOfThreadsinSdcaMaximumEntropyMulticlassTrainer.Optionssolved your issue? If so, please feel free to close this issue. If not, please confirm whether or not a different issue is now occurring, or the same error is being outputted. Thanks!
I was looking forward to verifying the solution. I had to deliver a talk this evening and I just got free. Let me try a few minutes.
I can see good improvement, but as anticipated, it still has a little of non-determinism.
By default, using multi-threads the accuracy used to fluctuate up to 4-5 percent.
.Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy(new SdcaMaximumEntropyMulticlassTrainer.Options { NumberOfThreads = 1 }))
These are the measurements obtained with the above code:
MicroAcc 94.66 95.33 94.66 94.66 95.33 94.66 94.66 94.66 95.33
MacroAcc 95.06 96.06 95.06 95.06 96.06 95.06 95.06 95.06 96.06
As you can see, MicroAcc fluctuates less de 1 percent and MacroAcc fluctuates 1 percent.
I will close the issue. Thank you for the improvement tip!
cc @harishsk @mstfbl