I would like to load a LightGBM model (for prediction only) from a string or buffer rather than a file on disk.
It seems that there is a method called model_from_string documentation link but ... it produces an error, which seemingly defeats the purpose of the method as I understand it.
import boto3
import lightgbm as lgb
import io
model_path = 'some/path/here'
s3_bucket = boto3.resource('s3').Bucket('some-bucket')
obj = s3_bucket.Object(model_path)
buf = io.BytesIO()
try:
obj.download_fileobj(buf)
except Exception as e:
raise e
else:
model = lgb.Booster().model_from_string(buf.read().decode("UTF-8"))
which produces the following error....
TypeError: Need at least one training dataset or model file to create booster instance
Alternatively, I thought that I might be able to use the regular loading method
lgb.Booster(model_file=buf.read().decode("UTF-8"))
... but this also doesn't work.
FileNotFoundError: [Errno 2] No such file or directory: ''
Now, I realize that I can create a workaround by writing the buffer to disk, and then reading it. However, this feels very redundant and inefficient.
Thus, my question is, how can instantiate a model to use for prediction without pointing to a an actual file on disk?
If there is no such functionality, can this please be implemented / fixed?
@constantinevitt
model_file param in Booster's constructor should be a path to a model, not a model itself:
model_file (string or None, optional (default=None)) – Path to the model file.
That's why you're getting a FileNotFoundError.
Also Booster constructor should receive either model_file (which is the path to a model), or assotiated trainig data in train_set param.
In your case you can use undocumented param to load model from a string during initialization phase:
model = lgb.Booster({'model_str': buf.read().decode("UTF-8")})
@guolinke What's the reason for hiding this feature? Can't be this exposed to normal Booster's constructor argument?
https://github.com/Microsoft/LightGBM/blob/c56412a859d4968f2b720514306be3404552b385/python-package/lightgbm/basic.py#L1669-L1672
All necessary codebase changes have been already done in #1766.
Seems that R package has this feature: #472.
@StrikerRUS
Thank you so much! This solved my problem
@StrikerRUS
I remember the model_str fashion is designed for deep_copy.
https://github.com/Microsoft/LightGBM/blob/ffb134cc31cc6ec7a456257fcdae9054ded54559/python-package/lightgbm/basic.py#L1690-L1693
But it can expose the public interface.
@constantinevitt After 5b5b98235e4fa8c1eed67f57bf5c409bbe1a09e0 commit it's not possible to use lgb.Booster({'model_str': "some_model_string"}). Instead, now it's possible to use model_str param: lgb.Booster(model_str="some_model_string").
Most helpful comment
@constantinevitt
model_fileparam in Booster's constructor should be a path to a model, not a model itself:That's why you're getting a
FileNotFoundError.Also Booster constructor should receive either
model_file(which is the path to a model), or assotiated trainig data intrain_setparam.In your case you can use undocumented param to load model from a string during initialization phase:
@guolinke What's the reason for hiding this feature? Can't be this exposed to normal Booster's constructor argument?
https://github.com/Microsoft/LightGBM/blob/c56412a859d4968f2b720514306be3404552b385/python-package/lightgbm/basic.py#L1669-L1672
All necessary codebase changes have been already done in #1766.
Seems that R package has this feature: #472.