Xgboost: Problem in scikit-learn API for XGBoost classification's predict_proba function

Created on 16 Oct 2017  路  7Comments  路  Source: dmlc/xgboost

Can I apply predict_proba function to multiple inputs in parallel?
What I am doing is, creating multiple inputs in parallel and then applying the trained model on each input to predict. What I have observed is, the prediction time increases as we keep increasing the number of inputs.
Is it because that model predicts in sequence instead of catering to all inputs in parallel?
Let me know if there is a way to apply this predict_proba function to multiple inputs in parallel.

I have used the function as follows:
testProb = model.predict_proba(X) where X is the single input and this instruction is spawned by multiple threads with different inputs. The model instance is globally loaded.

Sklearn Version = 0.19.0
Xgboost version = 0.6

System Configuration on which I am running it:
OS: Centos 7

cores: 52

Memory: 512 Gb

All 7 comments

the prediction time increases as we keep increasing the number of inputs

If by the "inputs" you mean individual rows, that seems a rather logical thing to expect.

If you are splitting data into separate chunks and running predict in parallel, and are not controlling the number of threads, the thread pool gets oversaturated. Try disabling the multithreading inside of xgboost, e.g., something like model._Booster.set_param('nthread', 1)
Or disable OpenMP multithreading by setting the OMP_NUM_THREADS=1 environment variable before running python.

Yes, by "inputs" I meant individual rows. Is there a way I can apply predict_proba to multiple individual rows at the same time?

For scoring of separate individual rows, I would suggest looking into the https://github.com/dmlc/treelite project. Straight xgboost would have a bit too much overhead for such task.

But since you already have multiple rows, why not to combine them into batches?

Well doing in batches is one way to do. But what if you have to predict online for every individual inputs?

You may turn off the openmp multithreading as I've explained earlier, and do it in parallel. But if performance really matters, use treelite.

I tried running it by disabling OpenMP multithreading by setting the OMP_NUM_THREADS=1 environment variable before running python. I saw the better performance while predicting. Can you tell me what is the reason behind this? Is it because multithreading of XGBoost ig getting GIL at python and this is hitting the performance?
TIA

@mkmukund10 It's very likely you predicted a matrix with a single row. The parallelism is for rows so it wont help if you have only one row to predict, and since it requires allocating a feature_cnt*nthread size matrix for parallel computation, the performance was worse than a single thread in your case.

Was this page helpful?
0 / 5 - 0 ratings