Current two tutorials in the docs use different columns to get a predicted value out of the pipeline into an instance of the user-defined prediction type:
How does one know which column to use to populate instances of the prediction type? Especially given that, in case of the (binary) classification solution, the Score column is also available (I guess, then it contains the probabilities of being in a certain class).
As for the trainer inputs, rules are more or less clear:
LabelColumn property)FeatureColumn property)Can the setup of the predictor output be done in similar way:
IDataView with the PredictedLabel column that would be a copy of the Score column.By the way, the mere explanation of the Score and PredictedLabel columns here would be appreciated as well. Then, at least, I'll update the docs to make story clearer.
The purpose of the output columns in scored IDataView is according to the learning task. e.g. if task is
@zeahmed thank you very much for the answer. I'll keep that information in mind while updating the docs.
Are there any plans to unify meaning of the columns across various learning tasks or introduce any other changes in this area of ML.NET. Or that part is more or less stable?
Thank you @zeahmed for the detailed answer.
@zeahmed I've been trying to use _Probability_ column in _EnsembleBinaryClassifier_, but it fails with 'Column 'Probability' not found in the data view'. Is it a bug or by design?
DRI RESPONSE: I can't find this information in our docs, and it definitely should be there. Wherefore marking this as documentation and up-for-grabs.
@Lanayx I'll suggest you to create separate issue, and provide us additional information (code snippet at least).
@JRAlexander @Ivanidzo4ka is that something that should be at https://docs.microsoft.com/en-us/dotnet/machine-learning/ ?
Is there a way to get class label names along with the probabilities that we get from Multi-class classification? ryGetScoreLabelNames is part of legacy code in 0.8
I believe this issue is already addressed as part of 1.0 API reference documentation. Now all trainers have a sub-section in their remarks called input/output columns, where the types and definition of the input/output columns are clearly explained. E.g.: https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers.averagedperceptrontrainer?view=ml-dotnet#input-and-output-columns
Most helpful comment
The purpose of the output columns in scored IDataView is according to the learning task. e.g. if task is
Regression
Binary Classification
Multi-class Classification
Clustering