Contents of automl_.graph.json:
{
"Inputs": {
"file_train": "D:\\SplitDatasets\\ExcitementFG2_train.csv",
"file_test": "D:\\SplitDatasets\\ExcitementFG2_valid.csv"
},
"Nodes": [
{
"Inputs": {
"CustomSchema": "sep=, col=Label:R4:78 col=Features1:R4:0-77 col=Features2:R4:79-202 header=+",
"InputFile": "$file_train"
},
"Name": "Data.CustomTextLoader",
"Outputs": {
"Data": "$data_train"
}
},
{
"Inputs": {
"CustomSchema": "sep=, col=Label:R4:78 col=Features1:R4:0-77 col=Features2:R4:79-202 header=+",
"InputFile": "$file_test"
},
"Name": "Data.CustomTextLoader",
"Outputs": {
"Data": "$data_test"
}
},
{
"Inputs": {
"BatchSize": 3,
"StateArguments": {
"Name": "AutoMlState",
"Settings": {
"Engine": {
"Name": "Rocket",
"Settings": {}
},
"Metric": "Accuracy",
"TerminatorArgs": {
"Name": "IterationLimited",
"Settings": {
"FinalHistoryLength": 100
}
},
"TrainerKind": "SignatureBinaryClassifierTrainer"
}
},
"TestingData": "$data_test",
"TrainingData": "$data_train",
"IgnoreColumns": ["cost"]
},
"Name": "Models.PipelineSweeper",
"Outputs": {
"Results": "$output_data",
"State": "$xyz"
}
}
],
"Outputs": {
"output_data": "C:\\Benchmarking\\01-ResultsOut.csv"
}
}
What happened?
Encountered an exception in LightGBM trainer
What did you expect?
A run to completion, w/o exception
--- Command line args ---
dotnet MML.dll execgraph C:\Benchmarking\automl_graph.json
--- Exception message ---
System.InvalidOperationException
HResult=0x80131509
Message=Categorical split features is zero length
Source=Microsoft.ML.Core
StackTrace:
at Microsoft.ML.Runtime.Contracts.Check(Boolean f, String msg) in C:\MLDotNet\src\Microsoft.ML.Core\Utilities\Contracts.cs:line 497
at Microsoft.ML.Trainers.FastTree.Internal.RegressionTree.CheckValid(Action`2 checker) in C:\MLDotNet\src\Microsoft.ML.FastTree\TreeEnsemble\RegressionTree.cs:line 469
at Microsoft.ML.Trainers.FastTree.Internal.RegressionTree..ctor(Int32[] splitFeatures, Double[] splitGain, Double[] gainPValue, Single[] rawThresholds, Single[] defaultValueForMissing, Int32[] lteChild, Int32[] gtChild, Double[] leafValues, Int32[][] categoricalSplitFeatures, Boolean[] categoricalSplit) in C:\MLDotNet\src\Microsoft.ML.FastTree\TreeEnsemble\RegressionTree.cs:line 223
at Microsoft.ML.Trainers.FastTree.Internal.RegressionTree.Create(Int32 numLeaves, Int32[] splitFeatures, Double[] splitGain, Single[] rawThresholds, Single[] defaultValueForMissing, Int32[] lteChild, Int32[] gtChild, Double[] leafValues, Int32[][] categoricalSplitFeatures, Boolean[] categoricalSplit) in C:\MLDotNet\src\Microsoft.ML.FastTree\TreeEnsemble\RegressionTree.cs:line 189
at Microsoft.ML.Runtime.LightGBM.Booster.GetModel(Int32[] categoricalFeatureBoudaries) in C:\MLDotNet\src\Microsoft.ML.LightGBM\WrappedLightGbmBooster.cs:line 241
at Microsoft.ML.Runtime.LightGBM.LightGbmTrainerBase`3.TrainCore(IChannel ch, IProgressChannel pch, Dataset dtrain, CategoricalMetaData catMetaData, Dataset dvalid) in C:\MLDotNet\src\Microsoft.ML.LightGBM\LightGbmTrainerBase.cs:line 378
at Microsoft.ML.Runtime.LightGBM.LightGbmTrainerBase`3.TrainModelCore(TrainContext context) in C:\MLDotNet\src\Microsoft.ML.LightGBM\LightGbmTrainerBase.cs:line 126
at Microsoft.ML.Runtime.Training.TrainerEstimatorBase`2.Train(TrainContext context) in C:\MLDotNet\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 92
at Microsoft.ML.Runtime.Training.TrainerEstimatorBase`2.Microsoft.ML.Runtime.ITrainer.Train(TrainContext context) in C:\MLDotNet\src\Microsoft.ML.Data\Training\TrainerEstimatorBase.cs:line 158
at Microsoft.ML.Runtime.Data.TrainUtils.TrainCore(IHostEnvironment env, IChannel ch, RoleMappedData data, ITrainer trainer, RoleMappedData validData, IComponentFactory`1 calibrator, Int32 maxCalibrationExamples, Nullable`1 cacheData, IPredictor inputPredictor) in C:\MLDotNet\src\Microsoft.ML.Data\Commands\TrainCommand.cs:line 254
at Microsoft.ML.Runtime.Data.TrainUtils.Train(IHostEnvironment env, IChannel ch, RoleMappedData data, ITrainer trainer, IComponentFactory`1 calibrator, Int32 maxCalibrationExamples) in C:\MLDotNet\src\Microsoft.ML.Data\Commands\TrainCommand.cs:line 223
at Microsoft.ML.Runtime.EntryPoints.LearnerEntryPointsUtils.Train[TArg,TOut](IHost host, TArg input, Func`1 createTrainer, Func`1 getLabel, Func`1 getWeight, Func`1 getGroup, Func`1 getName, Func`1 getCustom, ICalibratorTrainerFactory calibrator, Int32 maxCalibrationExamples) in C:\MLDotNet\src\Microsoft.ML.Data\EntryPoints\InputBase.cs:line 189
at Microsoft.ML.Runtime.LightGBM.LightGbm.TrainBinary(IHostEnvironment env, LightGbmArguments input) in C:\MLDotNet\src\Microsoft.ML.LightGBM\LightGbmBinaryTrainer.cs:line 189
@codemzs, @guolinke: Any ideas? (related to categorical handing of LightGBM)
Thanks @justinormont for the contacts. Thanks @codemzs for all your help on this today. @guolinke -- tr=LightGBMBinary{UseCat:+ CatSmooth:1} repeatedly fails, but tr=LightGBMBinary{UseCat:+} repeatedly works. Perhaps this is a LightGBM cat smoothing bug?
Unfortunately, I think I was wrong, and I do not think this bug is confined to smoothing. @guolinke -- I have some concrete reproducible data where this fails. Let me know if you want to sync up to debug
It seems there is a bug in model conversion from LightGBM to FastTree.
@codemzs any idea about this bug?
@justinormont can you share dataset on which you get exception?
I've tried to run LightGBm with use cat on small datasets (like adult) but I can't hit that exception.
This is still happening often unfortunately
LightGBM is one of the trainers that often produces the best results, so the exception may be an important one
Have a consistent repro -- would love to work with someone to help debug if possible
@vinodshanbhag for visibility
@Ivanidzo4ka we are still seeing this issue in more than one datasets. Can we please have this investiagated? @daholste can share the dataset if you need
I got the same issue when try to train model with empty values in department feature. I used OneHotEncoding. Then I replaced all empty strings to "-1" and issue has been fixed.
I think @justinormont sent me repo file some time ago, but I lost it. If someone can provide reproducible snippet of code, I would be more than happy to fix it.
Thanks a lot, @Ivanidzo4ka . Sent!
@justinormont Any updates on this issue? Just ran into it again. Thanks!
Same here. If it helps I get this now when my features have extremely high cardinality. When I print the schema of the categorical feature (the dataview, doing my own printing) I see that I have 99_999 different values in that categorical feature.
I have this issue too - any ETA for a fix?
@daholste can you send me the dataset and code with which I can reproduce this issue? The same that you sent to Ivan :)
Hey, sent!
Hey @daholste, I wasn't able to reproduce this at all, neither in TLC nor in ML.NET. And it looks like the Models.PipelineSweeper and Rocket components in the graph (along with the execgraph command)聽were removed in ML.NET some time ago. In any case, there was聽no repro even when using LightGbm from the command line or API since the dataset is only numerical columns, and the聽Categorical split features is zero length聽error isn't applicable so I'm not sure why you were seeing that in the first place.
I do, however, have the same error reproduced in #3659, and I believe the underlying cause is the same. It deterministically happens when there is only one categorical feature and UseCategoricalSplit is true in LightGbm, and it is likely a bug in model conversion from LightGbm to FastTree. Please follow #3659 for details and updates. I am closing this issue. Please feel free to reopen if you find a repro that is distinct from the conditions described in the other issue.
cc: @vinodshanbhag @justinormont @guolinke @vKuryshev @mayoatte @rauhs @eyvindwa
Most helpful comment
Hey @daholste, I wasn't able to reproduce this at all, neither in TLC nor in ML.NET. And it looks like the Models.PipelineSweeper and Rocket components in the graph (along with the execgraph command)聽were removed in ML.NET some time ago. In any case, there was聽no repro even when using LightGbm from the command line or API since the dataset is only numerical columns, and the聽
Categorical split features is zero length聽error isn't applicable so I'm not sure why you were seeing that in the first place.I do, however, have the same error reproduced in #3659, and I believe the underlying cause is the same. It deterministically happens when there is only one categorical feature and
UseCategoricalSplitistruein LightGbm, and it is likely a bug in model conversion from LightGbm to FastTree. Please follow #3659 for details and updates. I am closing this issue. Please feel free to reopen if you find a repro that is distinct from the conditions described in the other issue.cc: @vinodshanbhag @justinormont @guolinke @vKuryshev @mayoatte @rauhs @eyvindwa