I used
bst.save_model(file_path) # to save
bst1 = xgb.Booster(model_file=file_path) # to restore
But I noticed that when using the above two steps, the restored bst1
model returned None
with bst1.feature_names
but with bst.feature_names
did returned the feature names I used.
So is there anything wrong with what I have done? or is there another way to do for saving feature _names
thanks
It seems I have to manually save and load feature names, and set the feature names list like:
bst.feature_names = feature_names_list
for your code when saving the model is only done in C
level, I guess:
def save_model(self, fname):
"""
Save the model to a file.
Parameters
----------
fname : string
Output file name
"""
if isinstance(fname, STRING_TYPES): # assume file name
_check_call(_LIB.XGBoosterSaveModel(self.handle, c_str(fname)))
else:
raise TypeError("fname must be a string")
You can pickle the booster to save and restore all its baggage.
@khotilov, Thanks. Yes, I can. But I think this is something you should do for your project, or at least you should document that this save
method doesn't save booster
's feature names
Agree that it is really useful if feature_names can be saved along with booster.
For example, when you load a saved model for comparing variable importance with other xgb models, it would be useful to have feature_names, instead of "f1", "f2", etc.
Other than pickling, you can also store any model metadata you want in a string key-value form within its binary contents by using the internal (not python) booster attributes.
E.g., to create an internal 'feature_names' attribute before calling save_model, do
if hasattr(bst, 'feature_names'): bst.set_attr(feature_names = '|'.join(bst.feature_names))
Then after loading that model you may restore the python 'feature_names' attribute:
if bst.attr('feature_names') is not None: bst.feature_names = bst.attr('feature_names').split('|')
The problem with storing some set of internal metadata within models out-of-a-box is that this subset would need to be standardized across all the xgboost interfaces. Thus, it was left to a user to either use pickle if they always work with python objects, or to store any metadata they deem necessary for themselves as internal booster attributes.
Hi, If using the above attribute solution to be able to use xgb.feature_importance with labels after loading a saved model, please note that you need to define the feature_types attribute as well (in my case as None worked).
Most helpful comment
Other than pickling, you can also store any model metadata you want in a string key-value form within its binary contents by using the internal (not python) booster attributes.
E.g., to create an internal 'feature_names' attribute before calling save_model, do
Then after loading that model you may restore the python 'feature_names' attribute: