Lightgbm: Dataset.save_binary doesn't save init_score

Created on 18 Dec 2019  路  3Comments  路  Source: microsoft/LightGBM

Example to reproduce the bug below. I would expect the rmse after 1 tree for loading the Dataset from the saved binary file to be equal to 0.0108203 not 0.88188. This behavior indicates that Dataset.save_binary isn't including the init_score values in the file.

import lightgbm as lgb
import numpy as np
data = np.random.rand(100, 10)
label = np.sum(data, axis=1)
init_score = label + np.random.rand(100) * 0.02
dat = lgb.Dataset(data, label=label, init_score=init_score)
dat_without_init_score = lgb.Dataset(data, label=label)

p = dict()
p['num_iterations'] = 1
p['metric'] = 'rmse'
print('rmse after 1 tree for the original Dataset')
lgb.train(p, dat, valid_sets=[dat], verbose_eval=1)


print('rmse after 1 tree for the original Dataset without init_score')
lgb.train(p, dat_without_init_score, valid_sets=[dat_without_init_score], verbose_eval=1)

filename = 'abc.lgb'
dat.save_binary(filename)
dat2 = lgb.Dataset(filename)

print('rmse after 1 tree for loading from binary')
lgb.train(p, dat2, valid_sets=[dat2], verbose_eval=1)

Results:

rmse after 1 tree for the original Dataset
[1] training's rmse: 0.0108203
rmse after 1 tree for the original Dataset without init_score
[1] training's rmse: 0.88188
rmse after 1 tree for loading from binary
[1] training's rmse: 0.88188

Most helpful comment

Thanks! It sounds like this was an intentional design choice. If Dataset.save_binary and then loading the saved file doesn't result in exactly the same object that was originally saved, I feel like that should be spelled out super clearly in the API docs. That's definitely unexpected behavior.

All 3 comments

Given init_socre is likely frequently changed and it could be the prediction of init_model, we don't save it.
you can set init_score after loading Dataset, by, dat2.set_init_score().

@StrikerRUS What do you think about raising a warning when users call save_binary with init_score?

Thanks! It sounds like this was an intentional design choice. If Dataset.save_binary and then loading the saved file doesn't result in exactly the same object that was originally saved, I feel like that should be spelled out super clearly in the API docs. That's definitely unexpected behavior.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings