I'm mostly concerned with the R package, but this proposal should apply to other language bindings as well.
Currently, predict.xgb.Booster
calculates the sum of predictions from the first ntreelimit
trees, with an exception being ntreelimit = NULL
where all trees are used. At this point, it doesn't make sense to have a negative ntreelimit
argument.
I offer to address #2175 by returning the prediction from the abs(ntreelimit)
th tree only for negative ntreelimit
values. For example, with ntreelimit = -2
only the prediction from the second iteration is returned.
Here are some pros and cons of this proposal.
Pros:
ntree_limit
is declared as unsigned
in learner.cc
, but I guess you are OK with a cast?);Cons:
ntreelimit
would become less self-documenting.@nalzok I read the staged_predict
method. We can explicitly support this method without reusing the ntreelimit
. I'm happy to add more features to our interface as this seems to be a gain.
@trivialfis That would be very nice! I agree that extending the API interface would be the best solution. However, development takes time, and this feature request presumably doesn't have high priority, so it might take months or even years before the change finally rolls out. I would be happy to open a pull request if you want a temporary workaround.
The research project I'm working on calls for prediction from individual trees, so I just took the R package on CRAN and made the modification myself, and here is the commit that implemented everything. As you can see, there are only a few lines of change to review. (I don't have a GPU at hand, so I didn't touch gpu_predictor.cu
, but the changes required are almost identical)
Thanks. Will look into it tomorrow. Has been a long day.
@nalzok I had a research project that also manipulated individual trees. The way we solved it (in python) was to create a separate booster object for each tree then chain the trees together using the dmatrix.set_base_margin() method to start each booster from the predictions of the previous boosters.
Not exactly a high performance solution, but we had complete control over each boosting iteration. It was fine for experiments on smaller data.
@RAMitchell Hi Mitchell, thanks for the advice. Actually, we are now switching to use the Python package due to some operational reasons. I'm not very familiar with the Python API, though, so it would be great if you can share an example.
To summarize, this is about staging the prediction. We can implement it with existing prediction cache functionality, one hurdle is multi-class training where prediction cache is not available.
@RAMitchell Currently the DMatrix cache is created as part of booster's constructor. I want to enable adding cache on the fly. It will removes the tight connection between DMatrix and booster, also have this feature request for free. (Unless I'm again missing something).
We now have method for slicing the tree model, but the interface is only available on Python and C at the moment.