Xgboost: `plot_tree` failed to parse tree nodes with feature names.

Created on 7 Mar 2019  路  10Comments  路  Source: dmlc/xgboost

Here is a sample output from plot_tree:

[04:53:30] DEBUG: /home/fis/Workspace/xgb/xgboost/src/c_api/c_api.cc:1039: 0:[Init_Win_bytes_backward<28960] yes=1,no=2,missing=2
    1:[Total Length of Fwd Packets<25145] yes=7,no=8,missing=8
        7:[Init_Win_bytes_backward<235] yes=15,no=16,missing=15
            15:leaf=-0.0792390853
            16:[Init_Win_bytes_backward<237] yes=17,no=18,missing=18
                17:[Flow Bytes/s<969.591125] yes=19,no=20,missing=20
                    19:[Destination Port<88] yes=21,no=22,missing=21
                        21:[Flow IAT Min<160] yes=23,no=24,missing=23
                            23:leaf=0.0835221782
                            24:leaf=-0.0530516468
                        22:leaf=-0.0624887124
                    20:leaf=-0.0714369118

Used parameters are rather simple:

params = {'tree_method': 'gpu_hist',
          'objective': 'binary:logistic',
          'verbosity': 3,
          'n_gpus': '2',
          'max_depth': 8,
          'grow_policy': 'lossguide',
          'num_parallel_tree': 4}

bst = xgb.train(
    params, dtrain, evals=[(dtest, 'dtest')], num_boost_round=8)

Failed with error:

  File "/home/fis/Workspace/general-python/lib/python3.6/site-packages/xgboost/plotting.py", line 138, in _parse_node
    raise ValueError('Unable to parse node: {0}'.format(text))
ValueError: Unable to parse node: 1:[Total

Most helpful comment

@hcho3 I think I got it now. In:
https://github.com/dmlc/xgboost/blob/09443604168a3c58c7ff43e92daa163f7a51f686/python-package/xgboost/plotting.py#L223

strings are splited by any white space. So if your feature names contain white space then the code will break.

All 10 comments

!!! I had a good laugh. Yes, we should probably fix it.

What kind of data did you use? I'd like to try to reproduce it.

@hcho3 Ah, my mistake. I was helping other to look at a ~network~ security dataset that has network data (will try to fake a similar one for this bug), haven't really look at the header yet, the dataset is loaded from datatable. But still, plot_tree failed to parse the output.

@trivialfis Where you using Pandas to load the CSV file? If so, the CSV header become the feature names, and plot_tree will use the feature names.

@hcho3 I am trying datatable instead of pandas, the dataset is quite large and dense. Trying to use pandas now. I will try to make a reproducible script tomorrow.

@trivialfis I think datatable is similar to Pandas, in that I think the CSV header will be used as feature names. I don't think this is a bug.

@trivialfis Try running print(df.names), where df is the data table.

@hcho3 plot_tree failed with

  File "/home/fis/Workspace/general-python/lib/python3.6/site-packages/xgboost/plotting.py", line 138, in _parse_node
    raise ValueError('Unable to parse node: {0}'.format(text))
ValueError: Unable to parse node: 1:[Total

I see. In that case, let's find a re-producible example so that we can make plot_tree() more robust.

@hcho3 I think I got it now. In:
https://github.com/dmlc/xgboost/blob/09443604168a3c58c7ff43e92daa163f7a51f686/python-package/xgboost/plotting.py#L223

strings are splited by any white space. So if your feature names contain white space then the code will break.

Reproducible example: feature "foo_bar == 2"

Was this page helpful?
0 / 5 - 0 ratings

Related issues

FabHan picture FabHan  路  4Comments

nnorton24 picture nnorton24  路  3Comments

XiaoxiaoWang87 picture XiaoxiaoWang87  路  3Comments

choushishi picture choushishi  路  3Comments

vkuznet picture vkuznet  路  3Comments