Xgboost: Multiple output regression

Created on 8 Mar 2017  路  13Comments  路  Source: dmlc/xgboost

How do I perform multiple output regression? Or is it simply not possible?

My current assumption is that I would have to modify the code-base such that XGMatrix supports a matrix as labels and that I would have to create a custom objective function.

My end goal would be to perform regression to output two variables (a point) and to optimise euclidean loss. Would I be better off to make two seperate models (one for x coordinates and one for y coordinates).

Or... would I be better off using a random forest regressor within sklearn or some other alternative algorithm?

feature-request

Most helpful comment

This would be a really nice feature to have.

All 13 comments

Multivariate/multilabel regression is not currently implemented #574 #680
Tianqi had added some relevant placeholder data structures to gbtree learner, but no one had time, I guess, to work the machinery out.

Pity, since many competitions are with multi-outputs

This would be a really nice feature to have.

Do we have any updates on this?

I'm adding this feature to the feature request tracker: #3439. Hopefully, we can get to it some point.

I agree - this feature would be extremely valuable (exactly what I need right now...)

I also agree, while this is quite trivial to do in neural nets, it would be nice to also be able to do this in xgboost.

Would like to see this feature coming

any reason why it is closed?

@veonua See #3439.

In the meanwhile there is any alternative, like any ensemble of single output models like:

# Fit a model and predict the lens values from the original features
model = XGBRegressor(n_estimators=2000, max_depth=20, learning_rate=0.01)
model = multioutput.MultiOutputRegressor(model)
model.fit(X_train, X_lens_train)
preds = model.predict(X_test)

from: https://gist.github.com/MLWave/4a3f8b0fee43d45646cf118bda4d202a

In the meanwhile there is any alternative, like any ensemble of single output models like:

https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html

I am going to also weigh in and say that having such feature would be extremely handy. The MultiOutputRegressor mentioned above is a nice wrapper to build multiple models at once and it does work well for predicting target variables that are independent from one another. However, if the target variables are highly correlated, then you really want to build one model that predicts a vector.

Was this page helpful?
0 / 5 - 0 ratings