Lightgbm: Dataset.save_binary doesn't save init_score

Created on 18 Dec 2019 · 3Comments · Source: microsoft/LightGBM

Example to reproduce the bug below. I would expect the rmse after 1 tree for loading the Dataset from the saved binary file to be equal to 0.0108203 not 0.88188. This behavior indicates that Dataset.save_binary isn't including the init_score values in the file.

import lightgbm as lgb
import numpy as np
data = np.random.rand(100, 10)
label = np.sum(data, axis=1)
init_score = label + np.random.rand(100) * 0.02
dat = lgb.Dataset(data, label=label, init_score=init_score)
dat_without_init_score = lgb.Dataset(data, label=label)

p = dict()
p['num_iterations'] = 1
p['metric'] = 'rmse'
print('rmse after 1 tree for the original Dataset')
lgb.train(p, dat, valid_sets=[dat], verbose_eval=1)


print('rmse after 1 tree for the original Dataset without init_score')
lgb.train(p, dat_without_init_score, valid_sets=[dat_without_init_score], verbose_eval=1)

filename = 'abc.lgb'
dat.save_binary(filename)
dat2 = lgb.Dataset(filename)

print('rmse after 1 tree for loading from binary')
lgb.train(p, dat2, valid_sets=[dat2], verbose_eval=1)

Results:

rmse after 1 tree for the original Dataset
[1] training's rmse: 0.0108203
rmse after 1 tree for the original Dataset without init_score
[1] training's rmse: 0.88188
rmse after 1 tree for loading from binary
[1] training's rmse: 0.88188

Source

tbenthompson

Most helpful comment

Thanks! It sounds like this was an intentional design choice. If Dataset.save_binary and then loading the saved file doesn't result in exactly the same object that was originally saved, I feel like that should be spelled out super clearly in the API docs. That's definitely unexpected behavior.

tbenthompson on 26 Dec 2019

👍2

All 3 comments

Given init_socre is likely frequently changed and it could be the prediction of init_model, we don't save it.
you can set init_score after loading Dataset, by, dat2.set_init_score().

@StrikerRUS What do you think about raising a warning when users call save_binary with init_score?