Xgboost: [Feature proposal] Reuse negative ntreelimit argument

Created on 4 Nov 2019  路  9Comments  路  Source: dmlc/xgboost

I'm mostly concerned with the R package, but this proposal should apply to other language bindings as well.

Currently, predict.xgb.Booster calculates the sum of predictions from the first ntreelimit trees, with an exception being ntreelimit = NULL where all trees are used. At this point, it doesn't make sense to have a negative ntreelimit argument.

I offer to address #2175 by returning the prediction from the abs(ntreelimit)th tree only for negative ntreelimit values. For example, with ntreelimit = -2 only the prediction from the second iteration is returned.

Here are some pros and cons of this proposal.

Pros:

  1. Doesn't affect the API at all, retaining backward compatibility;
  2. Requires minimal change to the codebase (ntree_limit is declared as unsigned in learner.cc, but I guess you are OK with a cast?);
  3. Fairly easy to understand and use.

Cons:

  1. A bit hacky.
  2. The name ntreelimit would become less self-documenting.
feature-request

All 9 comments

@nalzok I read the staged_predict method. We can explicitly support this method without reusing the ntreelimit. I'm happy to add more features to our interface as this seems to be a gain.

@trivialfis That would be very nice! I agree that extending the API interface would be the best solution. However, development takes time, and this feature request presumably doesn't have high priority, so it might take months or even years before the change finally rolls out. I would be happy to open a pull request if you want a temporary workaround.

The research project I'm working on calls for prediction from individual trees, so I just took the R package on CRAN and made the modification myself, and here is the commit that implemented everything. As you can see, there are only a few lines of change to review. (I don't have a GPU at hand, so I didn't touch gpu_predictor.cu, but the changes required are almost identical)

Thanks. Will look into it tomorrow. Has been a long day.

@nalzok I had a research project that also manipulated individual trees. The way we solved it (in python) was to create a separate booster object for each tree then chain the trees together using the dmatrix.set_base_margin() method to start each booster from the predictions of the previous boosters.

Not exactly a high performance solution, but we had complete control over each boosting iteration. It was fine for experiments on smaller data.

@RAMitchell Hi Mitchell, thanks for the advice. Actually, we are now switching to use the Python package due to some operational reasons. I'm not very familiar with the Python API, though, so it would be great if you can share an example.

To summarize, this is about staging the prediction. We can implement it with existing prediction cache functionality, one hurdle is multi-class training where prediction cache is not available.

@RAMitchell Currently the DMatrix cache is created as part of booster's constructor. I want to enable adding cache on the fly. It will removes the tight connection between DMatrix and booster, also have this feature request for free. (Unless I'm again missing something).

We now have method for slicing the tree model, but the interface is only available on Python and C at the moment.

Was this page helpful?
0 / 5 - 0 ratings