Some models are inherently stochastic, others are deterministic. Are ML .Net models deterministic? In other words, given the same input, will an ML .Net model always return the same output/prediction? If so, to how many decimal places is this prediction deterministic?
@rebecca-burwei Yes, I believe ML.NET models are deterministic if you use them properly, we have bunch of tests verifying various of model outputs. Regarding the decimal places I believe it is related to the input data and model settings like how many rounds of training will be performed, floating point error will accumulate with the calculation.
@justinormont cc Justin see if Justin has more insights on this questions.
If you have a trained model, it is almost always deterministic. One example where it's not is if it includes the CountTargetEncoder. See: https://github.com/dotnet/machinelearning/pull/4514#pullrequestreview-330200694
Some parts of ML鈥ET can be deterministic. In general practice I wouldn't assume model training is fully deterministic.
Setting a seed in the MLContext and disabling multi-threading gets you close.
Many components also have their own seed values to set. There's a bit of a usability bug in ML鈥ET as non-hashing seeds should fall-back to the global seed, but the code wasn't added to do so in all components. See: https://github.com/dotnet/machinelearning/issues/4752#issuecomment-580686290
If you're using the AutoML APIs, there's a bit of a butterfly effect in model sweeping due to the small model differences being amplified. See: https://github.com/dotnet/machinelearning/issues/4986#issuecomment-606521860
Close this issue as answer has already been provided, @rebecca-burwei feel free to reopen if you have any follow up questions, thanks.
Most helpful comment
@rebecca-burwei Yes, I believe ML.NET models are deterministic if you use them properly, we have bunch of tests verifying various of model outputs. Regarding the decimal places I believe it is related to the input data and model settings like how many rounds of training will be performed, floating point error will accumulate with the calculation.
@justinormont cc Justin see if Justin has more insights on this questions.