Lightgbm: Ranking model giving different results when model.txt file loaded in python.

Created on 26 Sep 2019  路  8Comments  路  Source: microsoft/LightGBM

Hi,
I have trained a ranking model using the executable lightgbm (obtained after building directory via cmake and then running make -j4) file along with the config file LightGBM/examples/lambdarank/train.conf using my own data. The model is saved in the LightGBM_model.txt file and then the command './lightgbm config='predict.conf' is used to make predictions using the saved model.
The predictions are stored in LightGBM_predict_result.txt within the same directory.

I wanted to load the model saved in LightGBM_model.txt in a python script so that the model can be further deployed into production.

I used the following code to load model into python and make predictions:

from sklearn.datasets import load_svmlight_file
import lightgbm as lab
x_test, _ = load_svmlight_file('test_data.txt')
model = lgb.Booster(model_file='LightGBM_model.txt')
y_predictions = model.predict(x_test)

where test_data.txt is the test data in libsvm format.

The results obtained using the two methods (loading model in python using lgb.Booster method and using lightgbm executable file along with predict.conf) are different. Why is this happening and how can this be rectified?

I also tried changing the x_test from a scipy.sparse.csr.csr_matrix to a numpy.ndarray and it had no effect as expected.

Most helpful comment

@StrikerRUS there are several issues regard to this. Should I add it to the FAQ?
BTW, #2464 could raise the error in this case.

All 8 comments

you can try to set num_iterations in predict explicitly. python-package will automatically use the best iteration, while cli version doesn't.

The number of iterations was set to -1 which automatically takes the boosted model to the last iteration. There were no changes in the results after I explicitly changed the num_iterations in predict.

just saw your use load_svmlight_file, which is the root cause.
Refer to https://github.com/microsoft/LightGBM/issues/1908 and https://github.com/microsoft/LightGBM/issues/1776.

LightGBM use the zero-based libsvm file (when pass libsvm file to LightGBM), while load_svmlight_file is one-based.
So if your model is train by the one-based format, you should use the one-based data for the prediction, vice versa.

@StrikerRUS there are several issues regard to this. Should I add it to the FAQ?
BTW, #2464 could raise the error in this case.

@guolinke

there are several issues regard to this. Should I add it to the FAQ?

Yeah, I think it's definitely worth to mention that in the FAQ!

I remember there was a lot of issues in XGBoost about the same inconsistency too, e.g.: https://github.com/dmlc/xgboost/issues/2193#issuecomment-293957232, https://github.com/dmlc/xgboost/issues/3190#issuecomment-379969199, https://github.com/dmlc/xgboost/pull/3232.

BTW, #2464 could raise the error in this case.

It's really great!

@guolinke

BTW, #2464 could raise the error in this case.

I added a test case for that in #2464.

Should I add it to the FAQ?

Will you create a separate PR for this?

@guolinke Can you please either create a PR or an issue with the aim to not lose this docs enhancement in this closed issue?

Sorry, i am quite busy recently. Could you help me for the doc part?

Was this page helpful?
0 / 5 - 0 ratings