Xgboost: Multiple output regression

Created on 8 Mar 2017 · 13Comments · Source: dmlc/xgboost

How do I perform multiple output regression? Or is it simply not possible?

My current assumption is that I would have to modify the code-base such that XGMatrix supports a matrix as labels and that I would have to create a custom objective function.

My end goal would be to perform regression to output two variables (a point) and to optimise euclidean loss. Would I be better off to make two seperate models (one for x coordinates and one for y coordinates).

Or... would I be better off using a random forest regressor within sklearn or some other alternative algorithm?

feature-request

Source

miguelmartin75

👍31

Most helpful comment

This would be a really nice feature to have.

MarkusBonsch on 10 May 2017

👍17

All 13 comments

Multivariate/multilabel regression is not currently implemented #574 #680
Tianqi had added some relevant placeholder data structures to gbtree learner, but no one had time, I guess, to work the machinery out.

khotilov on 11 Mar 2017

Pity, since many competitions are with multi-outputs

jindongwang on 13 Mar 2017

This would be a really nice feature to have.

MarkusBonsch on 10 May 2017

👍17

Do we have any updates on this?

joel-thomas-wilson on 7 Sep 2018

I'm adding this feature to the feature request tracker: #3439. Hopefully, we can get to it some point.

hcho3 on 7 Sep 2018

I agree - this feature would be extremely valuable (exactly what I need right now...)

JacobKempster on 6 Nov 2018

I also agree, while this is quite trivial to do in neural nets, it would be nice to also be able to do this in xgboost.

bartl88 on 31 Jan 2019

Would like to see this feature coming

cp9612 on 26 Mar 2019

any reason why it is closed?

veonua on 15 Apr 2019

@veonua See #3439.

hcho3 on 15 Apr 2019

In the meanwhile there is any alternative, like any ensemble of single output models like:

# Fit a model and predict the lens values from the original features
model = XGBRegressor(n_estimators=2000, max_depth=20, learning_rate=0.01)
model = multioutput.MultiOutputRegressor(model)
model.fit(X_train, X_lens_train)
preds = model.predict(X_test)

from: https://gist.github.com/MLWave/4a3f8b0fee43d45646cf118bda4d202a

loretoparisi on 24 Sep 2019

In the meanwhile there is any alternative, like any ensemble of single output models like:

https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.MultiOutputRegressor.html

jimmywan on 25 Sep 2019

👍3

I am going to also weigh in and say that having such feature would be extremely handy. The MultiOutputRegressor mentioned above is a nice wrapper to build multiple models at once and it does work well for predicting target variables that are independent from one another. However, if the target variables are highly correlated, then you really want to build one model that predicts a vector.