Lightgbm: Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle

Created on 25 Sep 2019 · 10Comments · Source: microsoft/LightGBM

Basically, title.

train model LGBMClassifier
get predictions for test set
dump model via pickle (I also tried joblib - had same issue)
load model
get different predictions for test set

Both models have same parameters and metrics in properties.
I'm using python 3.7.3 and lightgbm 2.2.3

Source

slam3085

Most helpful comment

We are having the same problem. It happens when we try to deploy our models to production in a different system/environment than the one we trained the model in, and it is not easy to reproduce. Observed this issue two times:

Trying to deploy a model trained in a system (a node part of a Hadoop cluster with CentOS, 512GB RAM and a 32 cores Xeon) and loading the pickle/text in the rest of the nodes. Predictions are correct only in the same system where the model was trained and inconsistent in the rest (imagine, a continuous variable with values like [123.5, 143.2, 456.4] got predictions like [-4.0, 5.0, -9.0]). Of course using same data.
Today while deploying using MLFlow and setting up an endpoint (Ubuntu 16.04 LTS), exactly same problem.

Using both scikit-learn API and classic Python API. Changing the model in our pipeline for a typical scikit-learn RandomForestRegressor and everything is OK.

Any idea about what is happening? It looks like we are missing something important and this is driving us crazy.

dcanones on 11 Dec 2019

👍4

All 10 comments

@slam3085 Please post the whole snippet. Also, attach the data you use, if it's not sensitive. Or can you reproduce this issue with random data?

We do have a test where predictions are the same:
https://github.com/microsoft/LightGBM/blob/be206a96e7be553b0918805f6641e91285ce89d1/tests/python_package_test/test_sklearn.py#L155-L178

StrikerRUS on 25 Sep 2019

Hi!

I can't share the data and the snippet looks like this:

try:
    with open(path, "rb") as f:
        mdl = pickle.load(f)
    print(f'lightgbm_fitted_model_{today} loaded')
except:
    print(f'Unable to open lightgbm_fitted_model_{today}; fitting...')
    X_actual_train, X_eval, y_actual_train, y_eval = train_test_split(X_train, y_train, test_size=0.1,
                                                                      random_state=42, stratify=y_train)

    hyperparams = {'n_estimators': 200, 'class_weight': 'balanced', 'random_state': 42}
    mdl = lightgbm.LGBMClassifier(**hyperparams)
    mdl.fit(X_actual_train, y_actual_train, eval_set=(X_eval, y_eval), eval_metric='roc_auc')
    with open(path, "wb") as f:
        pickle.dump(mdl, f)
pred_proba = mdl.predict_proba(X_test)[:, 1]
n_pred_proba_95 = sum(pred_proba >= 0.95)
print(f'stats: {n_pred_proba_95} out of {len(pred_proba)} are 95% objects of class 1')
return pred_proba

Basically either I fit new model and dump it via pickle, or I use already trained one.

I tried to reproduce it in local environment only and couldn't, and now I understand why - I have python 3.7.3 as local environment and python 3.7.0 as production environment. So, actual steps to reproduce are:
1) train model in python 3.7.3
2) load it in python 3.7.0
3) results of predictions vary

I guess it's not really a bug then or very minor one and issue can be closed.

slam3085 on 26 Sep 2019

@slam3085

I tried to reproduce it in local environment only and couldn't, and now I understand why - I have python 3.7.3 as local environment and python 3.7.0 as production environment.

Makes sense! Thanks a lot for your investigation.

StrikerRUS on 26 Sep 2019

Trying to deploy a model trained in a system (a node part of a Hadoop cluster with CentOS, 512GB RAM and a 32 cores Xeon) and loading the pickle/text in the rest of the nodes. Predictions are correct only in the same system where the model was trained and inconsistent in the rest (imagine, a continuous variable with values like [123.5, 143.2, 456.4] got predictions like [-4.0, 5.0, -9.0]). Of course using same data.
Today while deploying using MLFlow and setting up an endpoint (Ubuntu 16.04 LTS), exactly same problem.

Using both scikit-learn API and classic Python API. Changing the model in our pipeline for a typical scikit-learn RandomForestRegressor and everything is OK.

Any idea about what is happening? It looks like we are missing something important and this is driving us crazy.

dcanones on 11 Dec 2019

👍4

What is the difference for the prediction results?

guolinke on 12 Dec 2019

For example, predictions on the same machine with the same environment the model was trained:

[123.5, 143.2, 456.4] (reasonable predictions, close to the original values we try to predict)

Predictions loading the model (joblib or txt) in another machine/environment, always installing LightGBM with conda or pip:

[-4.0, 5.0, -9.0] (weird predictions, negative and integer-like with the .0)

dcanones on 12 Dec 2019

with this large gap, I think it is not the same problem as this issue.
did you try to save and then load from the same machine/env?

guolinke on 12 Dec 2019

👍1

Yes, in the same machine and using the same environment there is no problem, in the same machine and using a different environment (for example, deploying the model using MLFlow) the results are different, same as saving the model and loading it in other machines...

The thing is, the models are supposed to be portable, at least saving it as a .txt file.

dcanones on 19 Dec 2019

@dcanones could you try the cli version in the same machine? I think it is possible the bug of env.

guolinke on 20 Dec 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.