Xgboost: How to restore both model and feature names

Created on 2 Feb 2018  路  7Comments  路  Source: dmlc/xgboost

I used

bst.save_model(file_path) # to save
bst1 = xgb.Booster(model_file=file_path) # to restore

But I noticed that when using the above two steps, the restored bst1 model returned None
with bst1.feature_names

but with bst.feature_names did returned the feature names I used.

So is there anything wrong with what I have done? or is there another way to do for saving feature _names

thanks

Most helpful comment

Other than pickling, you can also store any model metadata you want in a string key-value form within its binary contents by using the internal (not python) booster attributes.
E.g., to create an internal 'feature_names' attribute before calling save_model, do

if hasattr(bst, 'feature_names'): bst.set_attr(feature_names = '|'.join(bst.feature_names))

Then after loading that model you may restore the python 'feature_names' attribute:

if bst.attr('feature_names') is not None: bst.feature_names = bst.attr('feature_names').split('|')

All 7 comments

It seems I have to manually save and load feature names, and set the feature names list like:

bst.feature_names = feature_names_list

for your code when saving the model is only done in C level, I guess:

    def save_model(self, fname):
        """
        Save the model to a file.

        Parameters
        ----------
        fname : string
            Output file name
        """
        if isinstance(fname, STRING_TYPES):  # assume file name
            _check_call(_LIB.XGBoosterSaveModel(self.handle, c_str(fname)))
        else:
            raise TypeError("fname must be a string")

You can pickle the booster to save and restore all its baggage.

@khotilov, Thanks. Yes, I can. But I think this is something you should do for your project, or at least you should document that this save method doesn't save booster's feature names

Agree that it is really useful if feature_names can be saved along with booster.

For example, when you load a saved model for comparing variable importance with other xgb models, it would be useful to have feature_names, instead of "f1", "f2", etc.

Other than pickling, you can also store any model metadata you want in a string key-value form within its binary contents by using the internal (not python) booster attributes.
E.g., to create an internal 'feature_names' attribute before calling save_model, do

if hasattr(bst, 'feature_names'): bst.set_attr(feature_names = '|'.join(bst.feature_names))

Then after loading that model you may restore the python 'feature_names' attribute:

if bst.attr('feature_names') is not None: bst.feature_names = bst.attr('feature_names').split('|')

The problem with storing some set of internal metadata within models out-of-a-box is that this subset would need to be standardized across all the xgboost interfaces. Thus, it was left to a user to either use pickle if they always work with python objects, or to store any metadata they deem necessary for themselves as internal booster attributes.

Hi, If using the above attribute solution to be able to use xgb.feature_importance with labels after loading a saved model, please note that you need to define the feature_types attribute as well (in my case as None worked).

Was this page helpful?
0 / 5 - 0 ratings