Keras: Nonlinear regression using Keras

Created on 2 Mar 2016 · 13Comments · Source: keras-team/keras

I was trying to make nonlinear regression using Keras. However the result is far from satisfying. I was wondering how should I choose the Layers to build the NN and how to tuning the parameters like Activations, Objectives and others. Is there any principles or guide materials to address this problem? I am newcomer to deep learning and really need help here. The NN I built is as followes
model = Sequential()
model.add(Dense(input_dim = 4, output_dim = 500))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(input_dim = 500, output_dim = 1))
model.add(Activation('tanh'))
model.compile(loss='mean_absolute_error', optimizer='rmsprop')
Thanks~`

stale

Source

polarlight1994

Most helpful comment

it is more common to have a linear layer as the output of the net in regression tasks.
did you try normalise to zero mean/unit variance or scale your input to [0,1]?
it is more common to use MSE instead of MAE, even though that should not change much
can you overfit the net with your training data?

mrwns on 2 Mar 2016

👍3

All 13 comments

It's hard to give a generic advice, also without knowing the specifics of your data. For example, is your data labelled -1, +1? Is it better if you try without a hiden layer first? etc. etc. - probably the google group is a better place for asking for advice like this.

pasky on 2 Mar 2016

it is more common to have a linear layer as the output of the net in regression tasks.
did you try normalise to zero mean/unit variance or scale your input to [0,1]?
it is more common to use MSE instead of MAE, even though that should not change much
can you overfit the net with your training data?

mrwns on 2 Mar 2016

👍3

I think this question is more better suited for the keras google group here:
https://groups.google.com/forum/#!forum/keras-users

The question isn't specifically a package related question.

hlin117 on 2 Mar 2016

👍1

Unfortunately the answer is no. There is no magical tool that makes what you want.
As pointed out before, try to overfit your data. It means that despite the fact that your model will not be able to generalize, you have somehow the ability to approximate this function.
Remember that the fundamental theorem of neural networks is that any nn can theoretically approximate any non linear function (given enough parameters and data).

You can try:

Tune the number of hidden layers and the related number of neurons (funnel rule, more neurons in the first layers and less in the final layers as you go higher in abstraction).
Sigmoid is usually a good activation function. You can also ReLU.
You can look for other optimizers (AdaBoost...)
You may not have a huge dropout layer of p=0.5 between them.
Your output is also important (you may have a look at the cross entropy error).
Normalize your inputs (if it's financial time series, compute the returns. If it's a time series, be sure that it's stationary, i.e. first and second moments exist).

philipperemy on 4 Mar 2016

@pasky @hlin117 Thanks for your advice. I will move my issues to google group later. @mrwns Thank you for your concern. I have format my input using API of sklearn and i was wondering did Keras provide any methods to format data? @philipperemy Really appreciate your answer, now i have make the regression result quite better, however there is still a problem that some negative numbers came out which is not expected. Is that related to the data-format? I scale them into [-1, 1] with mean of 0. How should I constraint the regression result to be all positive in this circumstance? Really thanks for all your help! The NN i created is as follows:

`X_train_scale = preprocessing.scale(X_train)
X_test_scale = preprocessing.scale(X_test)

model = Sequential()
model.add(Dense(input_dim = 4, output_dim = 1000))
model.add(Activation('sigmoid'))
model.add(Dense(input_dim = 1000, output_dim = 1000))
model.add(Activation('sigmoid'))
model.add(Dense(input_dim = 1000, output_dim = 1000))
model.add(Activation('sigmoid'))
model.add(Dense(input_dim = 500, output_dim = 1))
model.add(Activation('linear'))`

polarlight1994 on 5 Mar 2016

@polarlight1994 yes it is somehow related to your inputs but you can also modify your model to handle it. I see two ways to fix your problem:

First you can normalize your data in a different way: http://stats.stackexchange.com/a/70808
This transformation (x-min(X))/(max(X)-min(X)) will give you values between 0 and 1. The fact that you don't have a mean of 0 and a variance of 1 shouldn't matter much.

Secondly, if you want to stick with your current normalization, you might want to replace your final Activation layer from Linear to ReLU. https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
ReLU layer is a Linear layer that converts all negative values to 0. See that as a max(0,x).

So you can replace your last layer by:

model.add(Activation(relu))

Finally if you want only (0,1) as output and no intermediary values like 0.123, you may have a look at the softmax layer (+argmax). This becomes a classification problem.

philipperemy on 5 Mar 2016

@philipperemy I have tried the second way you mentioned before, but the output will all be 0. Since the expected output data in my problem is not constrained in (0, 1), so i was wondering the ReLU will not work because of this? Also i was told that if i want to make a nonlinear regression i should set the linear as output layer. Is that right? By the way, if i use the relu in the last second layer will it solve my problem?

polarlight1994 on 5 Mar 2016

No using the linear activation layer as your final output in a nonlinear regression is not a prerequisite. It depends on where the values of your output data are. The ReLU will output values between (0, +infinity), the Sigmoid between (0,1) and the Linear between (-Infinity,+infinity). The Linear gives you negative values obviously. What is the interval of your expected data?
Changing all your Sigmoid by ReLU will speed up the training. ReLU is very easy to backpropagate compared to Sigmoid. But I don't think you will see a drastic change.

philipperemy on 5 Mar 2016

@philipperemy My expected data should be located in range of (0, +infinity). So as you explained i should set the output layer into ReLU. But i will get all 0 output and the loss will not decrease in every epoch. Is it because the input is constrained in (-1, 1), so after the first three Sigmoid function the output of ReLU is always almost 0?
Again, thank you for your patient explanation!

polarlight1994 on 5 Mar 2016

If you get always 0 as output, it means that all the features at the previous layer are negative. I don't think the problem comes from your model or from your input data. You can always try to test with positive data but I don't think it will solve your problem.
What optimizer are you using? Maybe you're using a non-suitable optimizer for this particular problem.
Or maybe you must run with more epochs. You have more than 2 million weights. So it may take time for the optimizer to find a minimum.
Also try to split your problem into small problems (drop all the superfluous layers and try to overcome the problem. When it's done, you can add them back one by one and see).

philipperemy on 5 Mar 2016

@philipperemy I followed your advice. Now the output is not always zero and the loss can decrease in every epoch, however, there are still lots of negative data in output. I can't figure out why...Here is my latest NN. I used the MinMaxScaler to format my input data in range of (0, 1).

`min_max_scaler = preprocessing.MinMaxScaler()
X_train_scale = min_max_scaler.fit_transform(XX_train)
X_test_scale = min_max_scaler.transform(XX_test)

model = Sequential()
model.add(Dense(input_dim = 4, output_dim = 500))
model.add(Activation('relu'))
model.add(Dense(input_dim = 500, output_dim = 1))
model.add(Activation('relu'))

model.compile(loss='mean_squared_error', optimizer='rmsprop')`

polarlight1994 on 7 Mar 2016

It seems very weird that you still have negative data in your output.

I tried a very simple example with negative and positive values in your XX_train and XX_test (before the MinMaxScaler between 0 and 1).

My expected values were set to -1. I wanted to see that despite the ReLU layers, the NN could output negative values. If you expected this code, you will see that all the predicted values are 0. The ReLU layer prevents negative values.

3s - loss: 1.0000 - acc: 1.0000 - val_loss: 1.0000 - val_acc: 1.0000
[[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]

Set the expected values to 1, and you will see all values very close to 1 (0.98,0.99, 1.01...). Once again the network could figure out this simple function. Values above 1 are consistent with the definition of the final ReLU layer.

Epoch 10/10
3s - loss: 2.2876e-04 - acc: 1.0000 - val_loss: 0.0014 - val_acc: 1.0000
[[ 1.01424801]
 [ 1.01220787]
 [ 1.00581753]
 [ 1.01019406]

Source code is here:

from __future__ import print_function
from keras.layers import Dense, Activation
from keras.models import Sequential
from sklearn import preprocessing
import numpy as np

N = 1000
XX_train = np.ones((N, 4)) * np.random.rand(N, 4) * (-10)
XX_test = np.ones((N, 4)) * np.random.rand(N, 4) * 10

YY_train_labels = - np.ones((N, 1))
YY_test_labels = - np.ones((N, 1))

min_max_scaler = preprocessing.MinMaxScaler()
min_max_scaler.fit(np.concatenate((XX_train, XX_test)))

X_train_scale = min_max_scaler.transform(XX_train)
X_test_scale = min_max_scaler.transform(XX_test)

model = Sequential()
model.add(Dense(input_dim=4, output_dim=500))
model.add(Activation('relu'))
model.add(Dense(input_dim=500, output_dim=1))
model.add(Activation('relu'))

model.compile(loss='mean_squared_error', optimizer='rmsprop')

model.fit(X_train_scale, YY_train_labels,
          batch_size=1, nb_epoch=10,
          show_accuracy=True, verbose=2,
          validation_data=(X_test_scale, YY_test_labels))

print(model.predict(X_train_scale, batch_size=1))

philipperemy on 7 Mar 2016

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.