Related:
It has been reported that the predict()
function in the Python interface does not work well with multiprocessing. We should find a way to allow multiple processes to predict with the same model simultaneously.
Is there any update on this. It seems that this is complete stoper from using xgb on Production...?
Any update? I am just discovering this now. This is indeed a problem...
It has been reported that the
predict()
function in the Python interface does not work well with multiprocessing. We should find a way to allow multiple processes to predict with the same model simultaneously.
What do you mean exactly?
In my context, I have a pool of processes that each load a pickled model and then try to make predictions, which is where I get the dmlc::Error
.
Note that I also tried with a unique process in the pool and still got the same error.
Here is the error stack:
terminate called after throwing an instance of 'dmlc::Error'
what(): [13:08:08] /workspace/include/xgboost/./../../src/common/common.h:41: /workspace/src/common/host_device_vector.cu: 150: initialization error
Stack trace returned 10 entries:
[bt] (0) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::StackTrace(unsigned long)+0x47) [0x7f14b4c0ffc7]
[bt] (1) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1d) [0x7f14b4c1042d]
[bt] (2) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dh::ThrowOnCudaError(cudaError, char const*, int)+0x123) [0x7f14b4de2153]
[bt] (3) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<float>::DeviceShard::Init(xgboost::HostDeviceVectorImpl<float>*, int)+0x278) [0x7f14b4e3fb68]
[bt] (4) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(+0x33b261) [0x7f14b4e17261]
[bt] (5) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<float>::Reshard(xgboost::GPUDistribution const&)+0x1b6) [0x7f14b4e40d26]
[bt] (6) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::obj::RegLossObj<xgboost::obj::LinearSquareLoss>::PredTransform(xgboost::HostDeviceVector<float>*)+0xf9) [0x7f14b4e0d239]
[bt] (7) /home/.../.local/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterPredict+0x107) [0x7f14b4c08be7]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f14f3b21dae]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f14f3b2171f]
It seems that CUDA is somehow involved in this. If that helps, I have CUDA v10.0.130
installed on my machine.
I tried to run it on a machine in the cloud that doesn't have any GPU and it seems to work as intended.
I ran into the same problem recently.
I noticed that if you use an older version of xgboost (0.72.1) the problem of "it hangs and doesn鈥檛 do anything" seems to disappear, but the process takes way too long.
Just for comparison I used multi Threading (which is slower than multi processing) on the latest version (0.90).
Results:
-Multi Processing on v.0.72.1: 672 sec
-Multi Threading on v.0.90: 164 sec
Some related thoughts: The nthread
is a runtime parameter, so when pickling (what Python do when spawning new process) can not include nthread
in the pickle. This can be resolved once #4855 is materialized.
I had the same problem when I tried to run it on a machine that has GPUs
Any update on this? I have the same issue here
Thanks for reminding. Let's see if I can get to this at the weekend.
I implemented a workaround using ZMQ Load Balancer.
So I cut out the code where XGBoost models are initialized and loaded in my master script, and put the code into an independent python script and implemented a worker routine that uses ZMQ load balancing techniques to serve the XGBoost models in the backend.
Due to system memory limit, I only initiated 4 workers, so 4 independent XGBoost models as backend workers. The frontend is still in the multiprocessing part of the original master script, but instead of utilizing XGBoost models to make predictions directly, the frontend now sends requests to backend XGBoost workers and receive the predictions from backend. Now no more dmlc errors.
Still, it will be awesome if XGBoost eventually make predict() work with multiprocessing
link to ZMQ Load Balancer which inspires my workaround
Hi I implemented a demo which shows how ZMQ load balancer can help with this issue:
Link to the demo
Right now another workaround is don't initialize XGBoost before forking (Like loading pickle only after fork). Maybe we can utilize some low level driver API to maintain the cuda context ourselves, but simply using a distributed framework like dask seems much simpler.
A quick update on this, thread safe prediction/inplace-prediction are now supported.
Most helpful comment
Is there any update on this. It seems that this is complete stoper from using xgb on Production...?