Lightgbm: Loading lightgbm in parallel and using predict freezes

Created on 12 Oct 2017 · 8Comments · Source: microsoft/LightGBM

I have the need to use my model to do predictions in batches and in parallel in python. If I load the model and create the data frames in a regular for loop and use the predict function it works with no issues. If I create disjoint data frames in parallel using multiprocessing in python and then use the predict function the for loop freezes indefinitely. Why does the behavior occur?

Here is a snippet of my code:

with open('models/model_test.pkl', 'rb') as fin:
    pkl_bst = pickle.load(fin)

def predict_generator(X):

    df = X

    print(df.head())
    df = (df.groupby(['user_id']).recommender_items.apply(flat_map)
          .reset_index().drop('level_1', axis=1))
    df.columns = ['user_id', 'product_id']

    print('Merge Data')
    user_lookup = pd.read_csv('data/user_lookup.csv')
    product_lookup = pd.read_csv('data/product_lookup.csv')
    product_map = dict(zip(product_lookup.product_id, product_lookup.name))

    print(user_lookup.head())

    df = pd.merge(df, user_lookup, on=['user_id'])
    df = pd.merge(df, product_lookup, on=['product_id'])
    df = df.sort_values(['user_id', 'product_id'])

    users = df.user_id.values
    items = df.product_id.values
    df.drop(['user_id', 'product_id'], axis=1, inplace=True)

    print('Prediction Step')

    prediction = pkl_bst.predict(df, num_iteration=pkl_bst.best_iteration)
    print('Prediction Complete')

    validation = pd.DataFrame(zip(users, items, prediction),
                              columns=['user', 'item', 'prediction'])
    validation['name'] = (validation.item
                          .apply(lambda x: get_mapping(x, product_map)))
    validation = pd.DataFrame(zip(validation.user,
                              zip(validation.name,
                                  validation.prediction)),
                              columns=['user', 'prediction'])
    print(validation.head())

    def get_items(x):

        sorted_list = sorted(list(x), key=lambda i: i[1], reverse=True)[:20]
        sorted_list = random.sample(sorted_list, 10)
        return [k for k, _ in sorted_list]

    relevance = validation.groupby('user').prediction.apply(get_items)
    return relevance.reset_index()

This works:

results = []
for d in df_list_sub:
    r = predict_generator(d)
    results.append(r)

This breaks:

from multiprocessing import Pool
import tqdm
pool = Pool(processes=8)
results = []
for x in tqdm.tqdm(pool.imap_unordered(predict_generator, df_list_sub), total=len(df_list_sub)):
    results.append(x)
    pass
pool.close()
pool.join()

Source

rydevera3

Most helpful comment

@guolinke - thank you for the reply. Would it be possible to allow multiple processes with the predict function? I'm sure being able to batch predict in parallel would be helpful for many different tasks.

rydevera3 on 13 Oct 2017

👍2

All 8 comments

@rydevera3
it may is caused by this function: https://github.com/Microsoft/LightGBM/blob/master/src/c_api.cpp#L181-L214
we only allow one process in the predict function.

guolinke on 13 Oct 2017

rydevera3 on 13 Oct 2017

👍2

@rydevera3 It is disable since the prediction is already using all CPUs. allowing multi-process cannot bring benefits, even will be much slower due to CPU conflicts.

guolinke on 13 Oct 2017

👎4

But for an online serving scenario where the model will only be scored it
would be great to be able to score in parallel.

In particular I explicitly reset the number of CPU cores after training to
1 or 2 to be able to predict the model on a different machine with way less
CPUs thsn used for training.
Guolin Ke notifications@github.com schrieb am Fr. 13. Okt. 2017 um 05:43:

@rydevera3 https://github.com/rydevera3 It is disable since the
prediction is already using all CPUs. allowing multi-process cannot bring
benefits, even will be much slower due to CPU conflicts.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/989#issuecomment-336342260,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABnc9J1KxyzXYL3lABFB4mQ2z_HexOJtks5srtxRgaJpZM4P3lgv
.

geoHeil on 13 Oct 2017

👍1

@geoHeil @rydevera3

The current implementation is, it only allow one process to access the predict function when in the same Booster object.
So if you use multi Booster objects, it can be paralleled.

However, if you don't limit the number of threads for these predict processes, the speed will be very slow due to the competition of CPU resources.

guolinke on 13 Oct 2017

👎2 😕1

@guolinke may I ask for some more clarification here:
I plan to use lightgbm in flask. Multiprocessing will be provided via gunicorn which is serving the flask app.
Each process will load its own booster object from the local file system into memory and limit the CPU resources appropriately.

This should work - or am I missing a catch here?

geoHeil on 13 Oct 2017

@geoHeil yeah, that should work.

guolinke on 13 Oct 2017

@geoHeil Did it work for you?

I am loading multiple processes but we have a very large amount of requests to do in production and having many threads per process would be advantageous.

We had a pool of different lightgbm instances in memory and we end up getting really good performance but then we get timeouts due to what I can assume is deadlocks.

Is there any plan to make this lock specific to an instance of lightgbm?