Lightgbm: Inconsistent prediction results after dumping and loading lightgbm.LGBMClassifier via pickle

Created on 25 Sep 2019  路  10Comments  路  Source: microsoft/LightGBM

Basically, title.

  1. train model LGBMClassifier
  2. get predictions for test set
  3. dump model via pickle (I also tried joblib - had same issue)
  4. load model
  5. get different predictions for test set

Both models have same parameters and metrics in properties.
I'm using python 3.7.3 and lightgbm 2.2.3

Most helpful comment

We are having the same problem. It happens when we try to deploy our models to production in a different system/environment than the one we trained the model in, and it is not easy to reproduce. Observed this issue two times:

  1. Trying to deploy a model trained in a system (a node part of a Hadoop cluster with CentOS, 512GB RAM and a 32 cores Xeon) and loading the pickle/text in the rest of the nodes. Predictions are correct only in the same system where the model was trained and inconsistent in the rest (imagine, a continuous variable with values like [123.5, 143.2, 456.4] got predictions like [-4.0, 5.0, -9.0]). Of course using same data.

  2. Today while deploying using MLFlow and setting up an endpoint (Ubuntu 16.04 LTS), exactly same problem.

Using both scikit-learn API and classic Python API. Changing the model in our pipeline for a typical scikit-learn RandomForestRegressor and everything is OK.

Any idea about what is happening? It looks like we are missing something important and this is driving us crazy.

All 10 comments

@slam3085 Please post the whole snippet. Also, attach the data you use, if it's not sensitive. Or can you reproduce this issue with random data?

We do have a test where predictions are the same:
https://github.com/microsoft/LightGBM/blob/be206a96e7be553b0918805f6641e91285ce89d1/tests/python_package_test/test_sklearn.py#L155-L178

Hi!

I can't share the data and the snippet looks like this:

try:
    with open(path, "rb") as f:
        mdl = pickle.load(f)
    print(f'lightgbm_fitted_model_{today} loaded')
except:
    print(f'Unable to open lightgbm_fitted_model_{today}; fitting...')
    X_actual_train, X_eval, y_actual_train, y_eval = train_test_split(X_train, y_train, test_size=0.1,
                                                                      random_state=42, stratify=y_train)

    hyperparams = {'n_estimators': 200, 'class_weight': 'balanced', 'random_state': 42}
    mdl = lightgbm.LGBMClassifier(**hyperparams)
    mdl.fit(X_actual_train, y_actual_train, eval_set=(X_eval, y_eval), eval_metric='roc_auc')
    with open(path, "wb") as f:
        pickle.dump(mdl, f)
pred_proba = mdl.predict_proba(X_test)[:, 1]
n_pred_proba_95 = sum(pred_proba >= 0.95)
print(f'stats: {n_pred_proba_95} out of {len(pred_proba)} are 95% objects of class 1')
return pred_proba

Basically either I fit new model and dump it via pickle, or I use already trained one.

I tried to reproduce it in local environment only and couldn't, and now I understand why - I have python 3.7.3 as local environment and python 3.7.0 as production environment. So, actual steps to reproduce are:
1) train model in python 3.7.3
2) load it in python 3.7.0
3) results of predictions vary

I guess it's not really a bug then or very minor one and issue can be closed.

@slam3085

I tried to reproduce it in local environment only and couldn't, and now I understand why - I have python 3.7.3 as local environment and python 3.7.0 as production environment.

Makes sense! Thanks a lot for your investigation.

We are having the same problem. It happens when we try to deploy our models to production in a different system/environment than the one we trained the model in, and it is not easy to reproduce. Observed this issue two times:

  1. Trying to deploy a model trained in a system (a node part of a Hadoop cluster with CentOS, 512GB RAM and a 32 cores Xeon) and loading the pickle/text in the rest of the nodes. Predictions are correct only in the same system where the model was trained and inconsistent in the rest (imagine, a continuous variable with values like [123.5, 143.2, 456.4] got predictions like [-4.0, 5.0, -9.0]). Of course using same data.

  2. Today while deploying using MLFlow and setting up an endpoint (Ubuntu 16.04 LTS), exactly same problem.

Using both scikit-learn API and classic Python API. Changing the model in our pipeline for a typical scikit-learn RandomForestRegressor and everything is OK.

Any idea about what is happening? It looks like we are missing something important and this is driving us crazy.

What is the difference for the prediction results?

For example, predictions on the same machine with the same environment the model was trained:

  • [123.5, 143.2, 456.4] (reasonable predictions, close to the original values we try to predict)

Predictions loading the model (joblib or txt) in another machine/environment, always installing LightGBM with conda or pip:

  • [-4.0, 5.0, -9.0] (weird predictions, negative and integer-like with the .0)

with this large gap, I think it is not the same problem as this issue.
did you try to save and then load from the same machine/env?

Yes, in the same machine and using the same environment there is no problem, in the same machine and using a different environment (for example, deploying the model using MLFlow) the results are different, same as saving the model and loading it in other machines...

The thing is, the models are supposed to be portable, at least saving it as a .txt file.

@dcanones could you try the cli version in the same machine? I think it is possible the bug of env.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mayer79 picture mayer79  路  3Comments

tbenthompson picture tbenthompson  路  3Comments

ClimbsRocks picture ClimbsRocks  路  3Comments

jameslamb picture jameslamb  路  3Comments

JoshuaC3 picture JoshuaC3  路  3Comments