Keras: constraint Keras output as integer

Created on 7 Apr 2016 · 33Comments · Source: keras-team/keras

I am using Keras for my project, but my ANN's output should have a integer type, i want to know how can i constraint the output as integer?

Following is my code.

from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from sklearn import preprocessing

import numpy as np

af = open('X_train.txt', 'r')
X_train = np.loadtxt(af)
af.close()

bf = open('y_train.txt', 'r')
y_train = np.loadtxt(bf)
bf.close()

cf = open('X_test.txt', 'r')
X_test = np.loadtxt(cf)
cf.close()

df = open('y_test.txt', 'r')
y_test = np.loadtxt(df)
df.close()

XX_train = np.array(X_train)
XX_test = np.array(X_test)

min_max_scaler = preprocessing.MinMaxScaler()
X_train_scale = min_max_scaler.fit_transform(XX_train)
X_test_scale = min_max_scaler.transform(XX_test)

model = Sequential()
model.add(Dense(input_dim = 5, output_dim = 6))
model.add(Activation('softplus'))
model.add(Dense(input_dim = 6, output_dim = 6))
model.add(Activation('softmax'))

model.compile(loss='mean_squared_error', optimizer='sgd')

model.fit(X_train_scale, y_train,
nb_epoch=100,
batch_size=10,
shuffle='false')
score = model.evaluate(X_test_scale, y_test, batch_size=5)
result = model.predict(X_test_scale)

stale

Source

zhxiaoq9

Most helpful comment

The output of softmax Layer is the probability of each class your sample belongs to. So they are float numbers. I guess you want to get what exactly the classes are, then you could use class_result=np.argmax(result,axis=-1) in the end of your code.

qzshadow on 7 Apr 2016

👍4 🎉3 ❤1 😄1

All 33 comments

qzshadow on 7 Apr 2016

👍4 🎉3 ❤1 😄1

Thank you for quzshadow's answer. But your answer can't solve my problem.

I want the ANN output, "result" in my source code, to be integer, but not floating point number.

For now, my output is
result =
[[ 0.8388778 1.84011269 1.82991123 2.71544957 2.92066956 0.89896947]
[ 1.27816939 2.32402492 1.49215913 0.62804997 0.24261415 -0.1909374 ]
[ 0.80473828 2.26316285 2.08706021 2.73876619 2.93090725 0.89492464]
[ 1.11842132 1.75132322 0.57260424 -0.08942437 -0.14830095 -0.27902615]
[ 0.18007335 0.7143867 0.03221995 -0.18673259 0.12239718 -0.102402 ]]

How ever, i want the "result" variable as
result =
[
[1 2 2 3 3 1]
[1 2 1 1 0 0]
......
]

Looking forward to your answer.

zhxiaoq9 on 7 Apr 2016

result.astype('int') could help if the output is numpy array.

qzshadow on 7 Apr 2016

👍1

Thanks for your answer again, may be my question is not clear enough.

I want the ANN itself give me the "result" as integer. In other words, the ANN should know i want the output as integer.

Using *_result.astype('int') *_means processing the ANN's output, translating it to integer, it does not mean my ANN know I want the output to be integer.

zhxiaoq9 on 7 Apr 2016

the output depends on the last layer of your network, which is the softmax layer in your code. As I mentioned before, the function of softmax layer is to output the probability of different classes a sample belongs to. So the output could never be integers(see the definition of softmax for details.)
Also, The ANN itself could NEVER be "clever" enough to "know your expectation on the form of outputs", if you want it to output to be integers, you should change the output layer that output only integers. But, to my knowledge, it might be impossible.

qzshadow on 7 Apr 2016

2221 check this out @zhxiaoq9

philipperemy on 7 Apr 2016

An ANN with integer output _can't_ be trained.

NasenSpray on 7 Apr 2016

What makes you say that? Can you give arguments?

philipperemy on 7 Apr 2016

What makes you say that? Can you give arguments?

d/dx round(x) = 0

NasenSpray on 7 Apr 2016

But that's not the point. You can always create a custom layer as an output layer that rounds the values (still under the gradient). Also, you can have binary output for classification problem.

philipperemy on 7 Apr 2016

I have no clue what you're trying to say. Fact is, the gradient of an integer function wrt. any parameter is always zero, therefore such functions can not be optimized with gradient descent.

NasenSpray on 7 Apr 2016

👍8

I agree that round(X) is that taking a finite sampling of your continuous function and if you compute the derivative, it will be equal to 0. But if you apply this transform in the last layer with a linear activation, it works. The error is round(X)-True_X and is correctly propagated. I could train a full network with a Rounding layer.

philipperemy on 7 Apr 2016

Are you using TensorFlow or Theano? I can guarantee that Theano doesn't propagate gradients through round().

NasenSpray on 7 Apr 2016

Okay I trust you on that. I am only talking about my experience. I could train a network by calling tf.round() in the final layer. It worked smoothly.

philipperemy on 7 Apr 2016

Haven't tried with Theano backend though!

philipperemy on 7 Apr 2016

Could you please run the following code and post the output:

X = K.placeholder(ndim=2)
Y = K.sum(K.round(X) ** 2)
fn = K.function([X], [K.gradients(Y, [X])])
print fn([np.ones((2,2), dtype=np.float32)])

NasenSpray on 7 Apr 2016

Sure

Using tensorflow (0.6.0)

import keras.backend as K
import numpy as np

X = K.placeholder(ndim=2)
Y = K.sum(K.round(X))
fn = K.function([X], [K.gradients(Y, [X])])
print fn([np.ones((2, 2), dtype=np.float32)])

Using TensorFlow backend.
Traceback (most recent call last):
[...]
TypeError: unsupported operand type(s) for ** or pow(): 'Tensor' and 'int'

I can replace by K.pow(...,2) but followed by TypeError: Can not convert a list into a Tensor or Operation.

philipperemy on 7 Apr 2016

But don't worry I trust you on that. My TF version is not 0.7+

philipperemy on 7 Apr 2016

import keras.backend as K
import numpy as np

X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], [K.gradients(Y, [X])])
print fn([np.ones((2, 2), dtype=np.float32)])

Does this work?

But don't worry I trust you on that. My TF version is not 0.7+

I don't have TF and just like to see the output.

NasenSpray on 7 Apr 2016

It doesnt compile.

This is a simple XOR with a rounding layer (calling K.round()) :

from __future__ import print_function

import numpy as np
from keras.callbacks import EarlyStopping
from keras.layers.core import Dense, Activation
from keras.models import Sequential
from keras.optimizers import SGD
from keras.layers import Layer
import keras.backend as K

np.random.seed(1337)  # for reproducibility

class Round(Layer):
    def __init__(self, **kwargs):
        super(Round, self).__init__(**kwargs)

    def get_output(self, train=False):
        X = self.get_input(train)
        return K.round(X)

    def get_config(self):
        config = {"name": self.__class__.__name__}
        base_config = super(Round, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))


def build_and_train_mlp_network(X_train, y_train, X_test, y_test):

    nb_epoch = 1000
    batch_size = 4

    model = Sequential()
    model.add(Dense(2, input_shape=(X_train.shape[1],)))
    model.add(Activation('sigmoid'))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.add(Round()) # return K.round(X)

    sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
    model.compile(loss='mean_squared_error', optimizer=sgd)  # or binary_crossentropy

    model.fit(X_train,
              y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              verbose=0,
              validation_data=(X_test, y_test))
    return model


if __name__ == "__main__":

    X_test = X_train = np.array([[1, 1], [1, 0], [0, 1], [0, 0]])
    y_train = y_test = np.array([0, 1, 1, 0])

    model = build_and_train_mlp_network(X_train, y_train, X_test, y_test)
    print(model.predict(X_test))

Output is (loss = 0, val_loss = 0):

[[ 0.]
 [ 1.]
 [ 1.]
 [ 0.]]

philipperemy on 7 Apr 2016

With 1000 epochs and without the Rounding layer, we have:

Using TensorFlow backend.
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
[[ 0.19370104]
 [ 0.86001378]
 [ 0.85930449]
 [ 0.10761544]]

10k epochs lead to

[[ 0.02035594]
 [ 0.98353308]
 [ 0.98348814]
 [ 0.01563455]]

So it seems that calling tf.round() actually works, at least in this case. It also works on the case of another project.

philipperemy on 7 Apr 2016

Last attempt lol:

import keras.backend as K
import numpy as np

X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], K.gradients(Y, [X]))
print fn([np.ones((2, 2), dtype=np.float32)])

My output is

[array([[ 0.,  0.],
       [ 0.,  0.]], dtype=float32)]

NasenSpray on 7 Apr 2016

@NasenSpray wow, with this:

import keras.backend as K
import numpy as np


X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], K.gradients(Y, [X]))
print fn([np.ones((2, 2), dtype=np.float32)])

I have:

Using TensorFlow backend.
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
[array([[ 2.,  2.],
       [ 2.,  2.]], dtype=float32)]

import keras
print(keras.__version__)

import tensorflow
print(tensorflow.__version__)

0.3.2
0.6.0

I don't have Theano configured on my laptop to check. Note: I'm using the TensorFlow CPU version.

philipperemy on 8 Apr 2016

So maybe that can explain why it works on the examples I posted previously. Do you have any ideas on why we have different results? Definitely very interesting difference.

philipperemy on 8 Apr 2016

I tried it on TF 0.7.1 and the gradient call returns None and it shouts an error because it's None. So maybe that's a feature of TF 0.6.0 to return input values if the gradient cannot be computed.

philipperemy on 8 Apr 2016

What TF version do you use when you have [array([[ 0., 0.], [ 0., 0.]], dtype=float32)] ?

philipperemy on 8 Apr 2016

This debate aside, back to @zhxiaoq9's original question, one solution if you want to constrain your model's output to integers is to look into one-hot encodings and use softmax and assume the largest output represents "on". One downside is that you will have to constrain your integers to a maximum size, as you decide the number of classes. The upside is that your model really only can predict integers, like how you wanted.

There could also be more efficient representations in which you use a typical binary number representation of integers, instead of one-hot, and just make sure your model's outputs go from 0.0 to 1.0 without limiting the output to summing to 1.0 as in the case with softmax. Like, maybe a sigmoid activation function and a constant threshold for "on" vs. "off" could work? Let's say any float larger than 0.9 should be an "on"-bit and anything else should be "off". In this case though the model isn't aware of integers like what you requested, but it should still work because the limitation is implicitly learned from your data (given enough data).

carlthome on 18 Apr 2016

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 24 May 2017

@carlthome this still does not give an integer as the output.
I think the question is :
"Instead of getting a vector of probabilities ,or a one hot vector
How can we get an integer which denotes the final prediction (the predicted class)?"

Even I am stuck with a similar problem:
I have a large amount of classes( 5000 classes ) to predict in my model. So instead of using one hot vector for the Labels (which would need a lot of computer memory, i.e. , if I use one hot vectors for Labels in my 100000 training examples :: 100000 * 5000 , assuming 5000 classes ==> Memory Error).
So I want to have some integer as an output which represents my prediction instead of any one-hot vector or so.

Or rather is there any way such that if my labels were
[1,3,4,2,3]
then I could get
[[0,1,0,0,0], [0,0,0,1,0], [0,0,0,0,1], [0,0,1,0,0], [0,0,0,1,0]]
only when the current batch is being used for training.