Xgboost: Is there anyway to translate XGBoost models to Weka ones to get better performance at inference

Created on 1 Jul 2016  路  3Comments  路  Source: dmlc/xgboost

Hi,
Is there anyway to translate XGBoost models to Weka models?

I have a weird problem with the efficiency of xgboost models at inference time.

I set the number of rounds/trees to 1, depth of the trees to 108 and I'm using jvm-package for a multi-class classification problem.

Predicting 6000 instances with XGBoost trees when I create instances one by one including creation of DMatrix and inference/prediction takes 380ms.
(excluding DMatrix creation it takes 180ms).

// creating a DMatrix with one row and the numberOfFeatures
// columns
double startTime = System.nanoTime();
DMatrix testInstance = new DMatrix(testingInstance, 1, numberOfFeatures, 9999F);
// predict just one instance
float[][] predicts = booster.predict(testInstance);
predictionTime += (System.nanoTime() - startTime) / 1e6;

However, when I predict/inference them using Weka random forests it takes just 30ms.

For the number of rounds(trees) = 50, WEKA random forests predict 6000 instances in 922ms and XGBoost trees predict them in 3429ms.

The problem is the accuracy of XGBoost trees are far better than random forests. However, the inference time is also very important in my case.

So, is there anyway to convert XGBoost models to WEKA models?!
Or do you have any idea how I can make prediction faster?! Is the efficiency problem because of calling native libraries from java source code?

Thanks.

Most helpful comment

It was mainly because current interface was not designed for online scoring(per instance) but rather for batch scoring. The overhead of creating DMatrix will take over for online scoring. We need a dedicated online scoring interface for that case.

All 3 comments

From the benchmark table at https://github.com/komiya-atsushi/xgboost-predictor-java : pure Java prediction is up to 4 orders of magnitude faster! There must be something terribly inefficient somewhere in xgboost Java interface... I'm considering the use of java-based scoring at some point, but I would prefer not to depend on another offshot project for that.

It was mainly because current interface was not designed for online scoring(per instance) but rather for batch scoring. The overhead of creating DMatrix will take over for online scoring. We need a dedicated online scoring interface for that case.

Was this page helpful?
0 / 5 - 0 ratings