Here is a sample output from plot_tree
:
[04:53:30] DEBUG: /home/fis/Workspace/xgb/xgboost/src/c_api/c_api.cc:1039: 0:[Init_Win_bytes_backward<28960] yes=1,no=2,missing=2
1:[Total Length of Fwd Packets<25145] yes=7,no=8,missing=8
7:[Init_Win_bytes_backward<235] yes=15,no=16,missing=15
15:leaf=-0.0792390853
16:[Init_Win_bytes_backward<237] yes=17,no=18,missing=18
17:[Flow Bytes/s<969.591125] yes=19,no=20,missing=20
19:[Destination Port<88] yes=21,no=22,missing=21
21:[Flow IAT Min<160] yes=23,no=24,missing=23
23:leaf=0.0835221782
24:leaf=-0.0530516468
22:leaf=-0.0624887124
20:leaf=-0.0714369118
Used parameters are rather simple:
params = {'tree_method': 'gpu_hist',
'objective': 'binary:logistic',
'verbosity': 3,
'n_gpus': '2',
'max_depth': 8,
'grow_policy': 'lossguide',
'num_parallel_tree': 4}
bst = xgb.train(
params, dtrain, evals=[(dtest, 'dtest')], num_boost_round=8)
Failed with error:
File "/home/fis/Workspace/general-python/lib/python3.6/site-packages/xgboost/plotting.py", line 138, in _parse_node
raise ValueError('Unable to parse node: {0}'.format(text))
ValueError: Unable to parse node: 1:[Total
!!! I had a good laugh. Yes, we should probably fix it.
What kind of data did you use? I'd like to try to reproduce it.
@hcho3 Ah, my mistake. I was helping other to look at a ~network~ security dataset that has network data (will try to fake a similar one for this bug), haven't really look at the header yet, the dataset is loaded from datatable. But still, plot_tree
failed to parse the output.
@trivialfis Where you using Pandas to load the CSV file? If so, the CSV header become the feature names, and plot_tree
will use the feature names.
@hcho3 I am trying datatable instead of pandas, the dataset is quite large and dense. Trying to use pandas now. I will try to make a reproducible script tomorrow.
@trivialfis I think datatable is similar to Pandas, in that I think the CSV header will be used as feature names. I don't think this is a bug.
@trivialfis Try running print(df.names)
, where df
is the data table.
@hcho3 plot_tree
failed with
File "/home/fis/Workspace/general-python/lib/python3.6/site-packages/xgboost/plotting.py", line 138, in _parse_node
raise ValueError('Unable to parse node: {0}'.format(text))
ValueError: Unable to parse node: 1:[Total
I see. In that case, let's find a re-producible example so that we can make plot_tree()
more robust.
@hcho3 I think I got it now. In:
https://github.com/dmlc/xgboost/blob/09443604168a3c58c7ff43e92daa163f7a51f686/python-package/xgboost/plotting.py#L223
strings are splited by any white space. So if your feature names contain white space then the code will break.
Reproducible example: feature "foo_bar == 2"
Most helpful comment
@hcho3 I think I got it now. In:
https://github.com/dmlc/xgboost/blob/09443604168a3c58c7ff43e92daa163f7a51f686/python-package/xgboost/plotting.py#L223
strings are splited by any white space. So if your feature names contain white space then the code will break.