Lightgbm: Ranking model giving different results when model.txt file loaded in python.

Created on 26 Sep 2019 · 8Comments · Source: microsoft/LightGBM

Hi,
I have trained a ranking model using the executable lightgbm (obtained after building directory via cmake and then running make -j4) file along with the config file LightGBM/examples/lambdarank/train.conf using my own data. The model is saved in the LightGBM_model.txt file and then the command './lightgbm config='predict.conf' is used to make predictions using the saved model.
The predictions are stored in LightGBM_predict_result.txt within the same directory.

I wanted to load the model saved in LightGBM_model.txt in a python script so that the model can be further deployed into production.

I used the following code to load model into python and make predictions:

from sklearn.datasets import load_svmlight_file
import lightgbm as lab
x_test, _ = load_svmlight_file('test_data.txt')
model = lgb.Booster(model_file='LightGBM_model.txt')
y_predictions = model.predict(x_test)

where test_data.txt is the test data in libsvm format.

The results obtained using the two methods (loading model in python using lgb.Booster method and using lightgbm executable file along with predict.conf) are different. Why is this happening and how can this be rectified?

I also tried changing the x_test from a scipy.sparse.csr.csr_matrix to a numpy.ndarray and it had no effect as expected.

Source

nalin-mathur

Most helpful comment

@StrikerRUS there are several issues regard to this. Should I add it to the FAQ?
BTW, #2464 could raise the error in this case.

guolinke on 27 Sep 2019

🚀1 👍1

All 8 comments

you can try to set num_iterations in predict explicitly. python-package will automatically use the best iteration, while cli version doesn't.

guolinke on 26 Sep 2019

The number of iterations was set to -1 which automatically takes the boosted model to the last iteration. There were no changes in the results after I explicitly changed the num_iterations in predict.

nalin-mathur on 27 Sep 2019

just saw your use load_svmlight_file, which is the root cause.
Refer to https://github.com/microsoft/LightGBM/issues/1908 and https://github.com/microsoft/LightGBM/issues/1776.

LightGBM use the zero-based libsvm file (when pass libsvm file to LightGBM), while load_svmlight_file is one-based.
So if your model is train by the one-based format, you should use the one-based data for the prediction, vice versa.

guolinke on 27 Sep 2019

@StrikerRUS there are several issues regard to this. Should I add it to the FAQ?
BTW, #2464 could raise the error in this case.

guolinke on 27 Sep 2019

🚀1 👍1

@guolinke

there are several issues regard to this. Should I add it to the FAQ?

Yeah, I think it's definitely worth to mention that in the FAQ!

I remember there was a lot of issues in XGBoost about the same inconsistency too, e.g.: https://github.com/dmlc/xgboost/issues/2193#issuecomment-293957232, https://github.com/dmlc/xgboost/issues/3190#issuecomment-379969199, https://github.com/dmlc/xgboost/pull/3232.

BTW, #2464 could raise the error in this case.

It's really great!

StrikerRUS on 27 Sep 2019

@guolinke

BTW, #2464 could raise the error in this case.

I added a test case for that in #2464.

Should I add it to the FAQ?

Will you create a separate PR for this?

StrikerRUS on 3 Oct 2019

@guolinke Can you please either create a PR or an issue with the aim to not lose this docs enhancement in this closed issue?

StrikerRUS on 9 Oct 2019

Sorry, i am quite busy recently. Could you help me for the doc part?

guolinke on 9 Oct 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[R-package] Add pkgdown documentation support

Laurae2 · 39Comments

lightGBM 2.2.200 still has glibc issue /lib64/libstdc++.so.6: version : GLIBCXX_3.4.20' not found

puyvqi · 29Comments

[WIP] next release (3.0.0)

guolinke · 27Comments

lightgbm crash

frank-dong-ms · 37Comments

Parameter min_data_in_leaf ignored by lightgbm.cv()

Merudo · 38Comments