Keras: Regression problems / continuous target

Created on 10 May 2015 · 24Comments · Source: keras-team/keras

My reading of the docs and code is that Keras currently supports binary and multi-class classification problems, but it does not support regression problems. Is that correct?

I'm happy to take this on if you provide some guidance. It seems to me that it only requires adding a "continuous" class_mode to Sequential.compile() and renaming Sequential._predict() to Sequential.predict()

It seems likely I've missed something, but I think it'd would be a nice to support regression problems.

Source

dansbecker

Most helpful comment

Hi Dan,

Keras can in fact work with regression problems, and even multidimensional regression (e.g. autoencoders). In such cases, you would use .predict() to get the output, and everything that is classification-related (class_mode, show_accuracy) would be irrelevant (i.e. if you tried to display classification accuracy it would be ~0 all the way).

Important to note: for regression cases, you would need to use mse or mae as the loss, and you could't use softmax as activation (since the output of the model isn't supposed to be probabilities). I think it would be useful to introduce a regression task in the examples, to point out these gotchas...

Here's a simple 2-layer unidimensional regression:

from keras.models import Sequential
from keras.layers.core import Dense, Activation

model = Sequential()
model.add(Dense(10, 64))
model.add(Activation('tanh'))
model.add(Dense(64, 1))
model.compile(loss='mean_absolute_error', optimizer='rmsprop')

model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
score = model.evaluate(X_test, y_test, batch_size=16)

And here's an autoencoder:

from keras.models import Sequential
from keras.layers.core import Dense, Activation

model = Sequential()
model.add(Dense(10, 5))
model.add(Activation('tanh'))
model.add(Dense(5, 10))
model.compile(loss='mean_squared_error', optimizer='rmsprop')

model.fit(X_train, X_train, nb_epoch=20, batch_size=16)
score = model.evaluate(X_test, X_test, batch_size=16)

fchollet on 10 May 2015

👍23 🎉4

All 24 comments

Hi Dan,

Here's a simple 2-layer unidimensional regression:

from keras.models import Sequential
from keras.layers.core import Dense, Activation

model = Sequential()
model.add(Dense(10, 64))
model.add(Activation('tanh'))
model.add(Dense(64, 1))
model.compile(loss='mean_absolute_error', optimizer='rmsprop')

model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
score = model.evaluate(X_test, y_test, batch_size=16)

And here's an autoencoder:

from keras.models import Sequential
from keras.layers.core import Dense, Activation

model = Sequential()
model.add(Dense(10, 5))
model.add(Activation('tanh'))
model.add(Dense(5, 10))
model.compile(loss='mean_squared_error', optimizer='rmsprop')

model.fit(X_train, X_train, nb_epoch=20, batch_size=16)
score = model.evaluate(X_test, X_test, batch_size=16)

fchollet on 10 May 2015

👍23 🎉4

I don't see a .predict() in the Sequential class. The compile method creates a theano function _predict(), but I don't expect that to fill the same role as the predict() method you describe. If you think I'm onto something here, I can create the PR with a predict() method.

Actually, I'd expect predict() to look a lot like the current predict_proba() and the activation function for the last layer will determine whether the output is a probability or not.

dansbecker on 11 May 2015

My mistake. I meant predict_proba (which will batch-process the data --useful if it can't fit in the GPU memory). Alternatively you could use _predict, which won't use batches --all predictions are computed in a single pass. Also, _predict is supposed to be an internal method...

The name predict_proba does seem somewhat misleading, especially in regression cases, since there is no guarantee that the output represents probabilities (in many cases, it won't). Maybe we need to figure out a cleaner, more intuitive API.

fchollet on 11 May 2015

Maybe we could simply rename predict_proba to predict, which makes perfect sense since predict_proba is really the user-facing version of _predict. For backward compatibility we can keep predict_proba around (it will just call predict) and display a warning.

fchollet on 11 May 2015

Personally, I'd favor matching the API to that of sklearn. This will help match users' expectations (since sklearn is so popular), and it allows a keras model to be dropped into a Pipeline as a direct replacement for an sklearn model.

In practice, this means mean keeping both predict and predict_proba (potentially providing warnings if predict_proba returned values outside [0,1]).

If we truly want to match the sklearn api, it also requires that predict_proba should return an Nx2 array, where the 1st column is the probability of a 0 outcome and the 2nd column is the probability of a 1 outcome. I don't like that feature of predict_proba(), but consistency with sklearn may justify we use the same convention.

dansbecker on 11 May 2015

There is a good argument to be made to match the sklearn API. However we're pretty far from it right now, both in API and use cases, and there are reasons why being more sklearn-like might be difficult or harmful.

In sklearn, it's always separate classes that are handling regression and classification, and regression is always supposed to be scalar. The sklearn API is based entirely on these assumptions.

Keras is more flexible than that --anything could be a target: classes (binary or categorical), scalar values, multidimensional arrays... and the output may or may not be probabilities. And if it's probabilities, it may be categorical ones like in sklearn (using categorical_crossentropy as objective), or binary ones if your output has size 1 and you're using binary_crossentropy as objective.

Really the user is free to configure their Keras models to output whatever they like. So the API should make as little assumptions as possible.

One thing we could do would be to roll out Keras model wrappers with a sklearn-conforming API, that would only do a subset of what Keras models can do. Then these guys could wrap the wrappers.

fchollet on 11 May 2015

Those are all good points.

As a next step that doesn't commit us to much in long-term direction, would you accept a PR that adds predict and keeps a predict_proba method (without changing the behavior of predict_proba).

dansbecker on 11 May 2015

Sure, that sounds good!

fchollet on 11 May 2015

Hi @fchollet thanks for giving an example on implementing regression. I personally think it would be really helpful to add this example to the example folder or to the docs of Keras.

However, I have one important point to make regarding the example. In the code you have presented above you add a 'tanh' activation layer for the unidimensional regression model. However, a linear regression model should not require a non-linear layer. Hence the a more simple code like the following should suffice:

from keras.models import Sequential
from keras.layers.core import Dense, Activation

model = Sequential()
model.add(Dense(2,1,init='uniform', activation='linear'))
model.compile(loss='mean_absolute_error', optimizer='rmsprop')  # Using mse loss results in faster convergence

model.fit(X_train, y_train, nb_epoch=20, batch_size=16)
score = model.evaluate(X_test, y_test, batch_size=16)

In fact, I did implement the above model to train a net to predict the sum of 2 numbers which are the feature vectors in X and the sum is the corresponding vectors in Y.

My code took around 2000 epochs to converge to 0.0554 loss. Whereas, your code took more than 100000 to converge to just 0.6904 loss. Using MSE loss it converges even faster in just 1000 epochs to 0.0042 loss.

Another thing I would like to mention is that using SGD as an optimizer instead of rmsprop results in the loss to increase continuously instead of decreasing and finally reach inf and then nan. Can you provide an explanation for that ?

napsternxg on 31 Jul 2015

Hey @napsternxg, I think @fchollet was illustrating the usage of Keras for nonlinear regression. For linear regression, just use the SGDRegressor (if memory necessitates incremental training) or the LinearRegression classes in scikit-learn -- if you want a simple linear model, using Keras is like taking a pressure washer to a watergun fight.

lukedeo on 31 Jul 2015

@lukedeo thanks. I probably will not use Keras for Linear Regression. But I was just trying to understand the concept of using Neural networks as universal approximation functions and a basic example for that case can be if it can learn to add or "weighted add" n numbers which is what my neural network was doing. Like all other examples of Keras having an understanding of using it to even implement Linear Regression or non-linear regression might be useful. Another important reason is that neural networks in Keras support incremental learning by default so it will be a useful exercise.

napsternxg on 31 Jul 2015

👍1

Hi,
I'm trying to build a simple mlp for multiclass regression.
here is my dataset:
X_train (40000,12)
y_train (40000, 12)
X_test (10000,12)
y_test (10000,12)
each input sample has 12 integer values around 10k for which I do unite normalization preprocessing.
each output sample has also 12 integer values around 10k which I am going to predict (normalized version prediction)

By following the above example with slight changes,

model = Sequential()
model.add(Dense(10, input_shape=(12,)))
model.add(Activation('relu'))
model.add(Dense(10, 5))
model.add(Activation('relu'))
model.add(Dense(5, 12))
model.compile(loss='mean_squared_error', optimizer='rmsprop')

model.fit(X_train, X_train, nb_epoch=20, batch_size=1)
score = model.evaluate(X_test, X_test, batch_size=1)
print('Test score:', score)

, I am getting the following error:

Using Theano backend.
(40000, 12) train samples
(10000, 12) test samples
Traceback (most recent call last):
  File "mlp_regression_dnn.py", line 115, in <module>
    model.add(Dense(10, 5))
  File "/home/majid/anaconda2/lib/python2.7/site-packages/keras/models.py", line 146, in add
    output_tensor = layer(self.outputs[0])
  File "/home/majid/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 458, in __call__
    self.build(input_shapes[0])
  File "/home/majid/anaconda2/lib/python2.7/site-packages/keras/layers/core.py", line 604, in build
    name='{}_W'.format(self.name))
TypeError: 'int' object is not callable

arasharchor on 7 Aug 2016

@smajida This should go into a new issue, as it's reasonably distinct from the issue you currently added it to.

All that said, a couple of things immediately jump out at me from your code:
In the lines where you add layers, you should specify just the number of units for that layer. For example, the line that you have as model.add(Dense(10, 5)) should just be model.add(Dense(10))

I haven't tested it, but this likely resolves your current error. Separately, you probably want to include y_train and y_test when fitting and evaluating your model respectively. Right now you are using X to predict itself.

Again, follow-up on this should probably go in a separate issue.

dansbecker on 7 Aug 2016

I am also trying to do regression but my error is not dropping

2064384/2064384 [==============================] - 48s - loss: 16771.4619
Epoch 2/500
2064384/2064384 [==============================] - 48s - loss: 16771.4415
Epoch 3/500
2064384/2064384 [==============================] - 48s - loss: 16771.4415
Epoch 4/500
2064384/2064384 [==============================] - 48s - loss: 16771.4415
Epoch 5/500
2064384/2064384 [==============================] - 48s - loss: 16771.4415
Epoch 6/500
2064384/2064384 [==============================] - 49s - loss: 16771.4415
Epoch 7/500
2064384/2064384 [==============================] - 60s - loss: 16771.4415

The code that I am using is -:

modelRed = Sequential()

modelRed.add(Dense(16, batch_input_shape=(None , red_rows)))
modelRed.add(Activation('tanh'))

modelRed.add(Dense(16))
modelRed.add(Activation('tanh'))

modelRed.add(Dense(1))
modelRed.add(Activation('tanh'))

for a mean squared error regression problem

modelRed.compile(optimizer='rmsprop', loss='mse')

modelRed.fit(X_train_red, Y_train_red, batch_size=batch_size, nb_epoch=nb_epoch)

scoreRed = modelRed.evaluate(X_test_red, Y_test_red , batch_size=batch_size)

print(scoreRed)

print("[INFO] dumping red weights to file...")
modelRed.save('models/'+date+'/modelRed.h5', overwrite=True)

Can somebody tell what is the mistake.

ghost on 12 Sep 2016

Hi @fchollet Is there any method to find the accuracy of multidimensional regression (Since the classification accuracy won't be helpful for continuous outputs).

DeepakLabh on 14 Dec 2016

@GovindBhagwan I don't know if you still need help, but anyway (maybe this will help someone else): If you are building a regressor, you need the last activation to be linear. So, try to remove the last tanh activation and see if that helps.

notnami on 18 Dec 2016

❤2 👍2

I have been dealing with problems where the training accuracy doesn't reflect the performance for regression. Would be great if we can have a regression example on a relatively well known and simple data set such as the Boston housing data.

philbort on 22 Dec 2016

👍1

Perhaps you're already doing this, but just to check: are you scaling your
inputs and outputs such that they lie within the same range (say, [0,1])?

On Wed, 21 Dec 2016, 21:36 philbort, notifications@github.com wrote:

I have been dealing with problems where the training accuracy doesn't
reflect the performance for regression. Would be great if we can have a
regression example on a relatively well known and simple data set such as
the Boston housing data.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/108#issuecomment-268703324, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABjlYT5QkGt8HUuhorKq7sBDl_Q4hcG-ks5rKeIogaJpZM4EVV5q
.

notnami on 22 Dec 2016

@notnami yes I did. My main confusion is when I do history = model.fit(X_train, Y_train) on my model model.compile(loss = 'mse', optimizer = Adam(), metrics = ['accuracy']) I can see the loss number goes down over epochs as expected, but the training accuracy number stays the same, never change. This makes me wonder how the accuracy is calculated for regression. As for classification, it is pretty straightforward.

philbort on 22 Dec 2016

Keras currently has no way to calculate accuracy for regression problems in
a meaningful way.

On Thu, 22 Dec 2016, 16:51 philbort, notifications@github.com wrote:

@notnami https://github.com/notnami yes I did. My main confusion is
when I do history = model.fit(X_train, Y_train) on my model model.compile(loss
= 'mse', optimizer = Adam(), metrics = ['accuracy']) I can see the loss
number goes down over epochs as expected, but the training accuracy
number stays the same, never change. This makes me wonder how the accuracy
is calculated for regression. As for classification, it is pretty
straightforward.

—
You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/108#issuecomment-268898411, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABjlYbFFQybAJZ7aew5KWQvyZODZFe0_ks5rKvDNgaJpZM4EVV5q
.

notnami on 22 Dec 2016

"Accuracy" for a regression problem is not well defined. If you were expecting a value of correct outputs / validation examples, you're going to realistically see an "accuracy" of 0/N every time. Take the below table for an example. How many are correct, in your opinion? - the accuracy would actually be 1/3; the odds that a model will output exactly the right value to precision is very minimal.

| Predicted | Actual | Correct (your opinion) | Equivalent (pythonically) |
| ---- | --- | ---- | --- |
| 0.17995000000000 | 0.1799499999999 | ? | No
| 0.0468456545332457 | 0.1697456432554652 | ? | No
| 2.7895461238351237543 | 2.7895461238351237543 | ? | Probably

Regression is fundamentally an error problem, not an accuracy problem. The standard MSE and/or MAE metrics are the only items that make sense to score a regression problem.

The only "accuracy" (as defined for classification problems) measurement that could possible make sense is one that involves a threshold; for example: prediction = Correct if abs(actual - prediction) <= threshold else prediction = Incorrect.

As I final point, I will simply ask, what is the accuracy of this regression? A simple fraction of the form N / 100 will suffice..

patyork on 23 Dec 2016

👍7 ❤1

@notnami @patyork Thanks guys. This makes a lot of sense and very helpful.

philbort on 23 Dec 2016

For the record, as already pointed out it doesn't make much sense to talk about "accuracy" (in the classification sense) for regression problems. You want to be using something like the R2 score (see sklearn.metrics.r2_score, for example)