Lightgbm: Cannot instantiate model from a string or buffer

Created on 11 Apr 2019 · 4Comments · Source: microsoft/LightGBM

I would like to load a LightGBM model (for prediction only) from a string or buffer rather than a file on disk.

It seems that there is a method called model_from_string documentation link but ... it produces an error, which seemingly defeats the purpose of the method as I understand it.

import boto3
import lightgbm as lgb
import io

model_path = 'some/path/here'
s3_bucket = boto3.resource('s3').Bucket('some-bucket')

obj = s3_bucket.Object(model_path)
buf = io.BytesIO()
try:
    obj.download_fileobj(buf)
except Exception as e:
    raise e
else:
    model = lgb.Booster().model_from_string(buf.read().decode("UTF-8"))

which produces the following error....

TypeError: Need at least one training dataset or model file to create booster instance

Alternatively, I thought that I might be able to use the regular loading method

lgb.Booster(model_file=buf.read().decode("UTF-8"))

... but this also doesn't work.

FileNotFoundError: [Errno 2] No such file or directory: ''

Now, I realize that I can create a workaround by writing the buffer to disk, and then reading it. However, this feels very redundant and inefficient.

Thus, my question is, how can instantiate a model to use for prediction without pointing to a an actual file on disk?

If there is no such functionality, can this please be implemented / fixed?

Source

constantinevitt

Most helpful comment

@constantinevitt
model_file param in Booster's constructor should be a path to a model, not a model itself:

model_file (string or None, optional (default=None)) – Path to the model file.

That's why you're getting a FileNotFoundError.

Also Booster constructor should receive either model_file (which is the path to a model), or assotiated trainig data in train_set param.

In your case you can use undocumented param to load model from a string during initialization phase:

model = lgb.Booster({'model_str': buf.read().decode("UTF-8")})

@guolinke What's the reason for hiding this feature? Can't be this exposed to normal Booster's constructor argument?
https://github.com/Microsoft/LightGBM/blob/c56412a859d4968f2b720514306be3404552b385/python-package/lightgbm/basic.py#L1669-L1672
All necessary codebase changes have been already done in #1766.
Seems that R package has this feature: #472.

StrikerRUS on 11 Apr 2019

👍2 🎉1

All 4 comments

@constantinevitt
model_file param in Booster's constructor should be a path to a model, not a model itself:

model_file (string or None, optional (default=None)) – Path to the model file.

That's why you're getting a FileNotFoundError.

Also Booster constructor should receive either model_file (which is the path to a model), or assotiated trainig data in train_set param.

In your case you can use undocumented param to load model from a string during initialization phase:

model = lgb.Booster({'model_str': buf.read().decode("UTF-8")})

StrikerRUS on 11 Apr 2019

👍2 🎉1

@StrikerRUS
Thank you so much! This solved my problem

constantinevitt on 11 Apr 2019

🎉1

@StrikerRUS
I remember the model_str fashion is designed for deep_copy.
https://github.com/Microsoft/LightGBM/blob/ffb134cc31cc6ec7a456257fcdae9054ded54559/python-package/lightgbm/basic.py#L1690-L1693

But it can expose the public interface.

guolinke on 12 Apr 2019

👍2

@constantinevitt After 5b5b98235e4fa8c1eed67f57bf5c409bbe1a09e0 commit it's not possible to use lgb.Booster({'model_str': "some_model_string"}). Instead, now it's possible to use model_str param: lgb.Booster(model_str="some_model_string").

StrikerRUS on 13 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings