I am using Keras for my project, but my ANN's output should have a integer type, i want to know how can i constraint the output as integer?
Following is my code.
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from sklearn import preprocessing
import numpy as np
af = open('X_train.txt', 'r')
X_train = np.loadtxt(af)
af.close()
bf = open('y_train.txt', 'r')
y_train = np.loadtxt(bf)
bf.close()
cf = open('X_test.txt', 'r')
X_test = np.loadtxt(cf)
cf.close()
df = open('y_test.txt', 'r')
y_test = np.loadtxt(df)
df.close()
XX_train = np.array(X_train)
XX_test = np.array(X_test)
min_max_scaler = preprocessing.MinMaxScaler()
X_train_scale = min_max_scaler.fit_transform(XX_train)
X_test_scale = min_max_scaler.transform(XX_test)
model = Sequential()
model.add(Dense(input_dim = 5, output_dim = 6))
model.add(Activation('softplus'))
model.add(Dense(input_dim = 6, output_dim = 6))
model.add(Activation('softmax'))
model.compile(loss='mean_squared_error', optimizer='sgd')
model.fit(X_train_scale, y_train,
nb_epoch=100,
batch_size=10,
shuffle='false')
score = model.evaluate(X_test_scale, y_test, batch_size=5)
result = model.predict(X_test_scale)
The output of softmax Layer is the probability of each class your sample belongs to. So they are float numbers. I guess you want to get what exactly the classes are, then you could use class_result=np.argmax(result,axis=-1) in the end of your code.
Thank you for quzshadow's answer. But your answer can't solve my problem.
I want the ANN output, "result" in my source code, to be integer, but not floating point number.
For now, my output is
result =
[[ 0.8388778 1.84011269 1.82991123 2.71544957 2.92066956 0.89896947]
[ 1.27816939 2.32402492 1.49215913 0.62804997 0.24261415 -0.1909374 ]
[ 0.80473828 2.26316285 2.08706021 2.73876619 2.93090725 0.89492464]
[ 1.11842132 1.75132322 0.57260424 -0.08942437 -0.14830095 -0.27902615]
[ 0.18007335 0.7143867 0.03221995 -0.18673259 0.12239718 -0.102402 ]]
How ever, i want the "result" variable as
result =
[
[1 2 2 3 3 1]
[1 2 1 1 0 0]
......
]
Looking forward to your answer.
result.astype('int') could help if the output is numpy array.
Thanks for your answer again, may be my question is not clear enough.
I want the ANN itself give me the "result" as integer. In other words, the ANN should know i want the output as integer.
Using *_result.astype('int') *_means processing the ANN's output, translating it to integer, it does not mean my ANN know I want the output to be integer.
the output depends on the last layer of your network, which is the softmax layer in your code. As I mentioned before, the function of softmax layer is to output the probability of different classes a sample belongs to. So the output could never be integers(see the definition of softmax for details.)
Also, The ANN itself could NEVER be "clever" enough to "know your expectation on the form of outputs", if you want it to output to be integers, you should change the output layer that output only integers. But, to my knowledge, it might be impossible.
An ANN with integer output _can't_ be trained.
What makes you say that? Can you give arguments?
What makes you say that? Can you give arguments?
d/dx round(x) = 0
But that's not the point. You can always create a custom layer as an output layer that rounds the values (still under the gradient). Also, you can have binary output for classification problem.
I have no clue what you're trying to say. Fact is, the gradient of an integer function wrt. any parameter is always zero, therefore such functions can not be optimized with gradient descent.
I agree that round(X) is that taking a finite sampling of your continuous function and if you compute the derivative, it will be equal to 0. But if you apply this transform in the last layer with a linear activation, it works. The error is round(X)-True_X and is correctly propagated. I could train a full network with a Rounding layer.
Are you using TensorFlow or Theano? I can guarantee that Theano doesn't propagate gradients through round().
Okay I trust you on that. I am only talking about my experience. I could train a network by calling tf.round() in the final layer. It worked smoothly.
Haven't tried with Theano backend though!
Could you please run the following code and post the output:
X = K.placeholder(ndim=2)
Y = K.sum(K.round(X) ** 2)
fn = K.function([X], [K.gradients(Y, [X])])
print fn([np.ones((2,2), dtype=np.float32)])
Sure
Using tensorflow (0.6.0)
import keras.backend as K
import numpy as np
X = K.placeholder(ndim=2)
Y = K.sum(K.round(X))
fn = K.function([X], [K.gradients(Y, [X])])
print fn([np.ones((2, 2), dtype=np.float32)])
Using TensorFlow backend.
Traceback (most recent call last):
[...]
TypeError: unsupported operand type(s) for ** or pow(): 'Tensor' and 'int'
I can replace by K.pow(...,2) but followed by TypeError: Can not convert a list into a Tensor or Operation.
But don't worry I trust you on that. My TF version is not 0.7+
import keras.backend as K
import numpy as np
X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], [K.gradients(Y, [X])])
print fn([np.ones((2, 2), dtype=np.float32)])
Does this work?
But don't worry I trust you on that. My TF version is not 0.7+
I don't have TF and just like to see the output.
It doesnt compile.
This is a simple XOR with a rounding layer (calling K.round()) :
from __future__ import print_function
import numpy as np
from keras.callbacks import EarlyStopping
from keras.layers.core import Dense, Activation
from keras.models import Sequential
from keras.optimizers import SGD
from keras.layers import Layer
import keras.backend as K
np.random.seed(1337) # for reproducibility
class Round(Layer):
def __init__(self, **kwargs):
super(Round, self).__init__(**kwargs)
def get_output(self, train=False):
X = self.get_input(train)
return K.round(X)
def get_config(self):
config = {"name": self.__class__.__name__}
base_config = super(Round, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
def build_and_train_mlp_network(X_train, y_train, X_test, y_test):
nb_epoch = 1000
batch_size = 4
model = Sequential()
model.add(Dense(2, input_shape=(X_train.shape[1],)))
model.add(Activation('sigmoid'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.add(Round()) # return K.round(X)
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='mean_squared_error', optimizer=sgd) # or binary_crossentropy
model.fit(X_train,
y_train,
batch_size=batch_size,
nb_epoch=nb_epoch,
verbose=0,
validation_data=(X_test, y_test))
return model
if __name__ == "__main__":
X_test = X_train = np.array([[1, 1], [1, 0], [0, 1], [0, 0]])
y_train = y_test = np.array([0, 1, 1, 0])
model = build_and_train_mlp_network(X_train, y_train, X_test, y_test)
print(model.predict(X_test))
Output is (loss = 0, val_loss = 0):
[[ 0.]
[ 1.]
[ 1.]
[ 0.]]
With 1000 epochs and without the Rounding layer, we have:
Using TensorFlow backend.
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
[[ 0.19370104]
[ 0.86001378]
[ 0.85930449]
[ 0.10761544]]
10k epochs lead to
[[ 0.02035594]
[ 0.98353308]
[ 0.98348814]
[ 0.01563455]]
So it seems that calling tf.round() actually works, at least in this case. It also works on the case of another project.
Last attempt lol:
import keras.backend as K
import numpy as np
X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], K.gradients(Y, [X]))
print fn([np.ones((2, 2), dtype=np.float32)])
My output is
[array([[ 0., 0.],
[ 0., 0.]], dtype=float32)]
@NasenSpray wow, with this:
import keras.backend as K
import numpy as np
X = K.placeholder(ndim=2)
Y = K.sum(K.square(K.round(X)))
fn = K.function([X], K.gradients(Y, [X]))
print fn([np.ones((2, 2), dtype=np.float32)])
I have:
Using TensorFlow backend.
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
[array([[ 2., 2.],
[ 2., 2.]], dtype=float32)]
import keras
print(keras.__version__)
import tensorflow
print(tensorflow.__version__)
0.3.2
0.6.0
I don't have Theano configured on my laptop to check. Note: I'm using the TensorFlow CPU version.
So maybe that can explain why it works on the examples I posted previously. Do you have any ideas on why we have different results? Definitely very interesting difference.
I tried it on TF 0.7.1 and the gradient call returns None and it shouts an error because it's None. So maybe that's a feature of TF 0.6.0 to return input values if the gradient cannot be computed.
What TF version do you use when you have [array([[ 0., 0.],
[ 0., 0.]], dtype=float32)] ?
This debate aside, back to @zhxiaoq9's original question, one solution if you want to constrain your model's output to integers is to look into one-hot encodings and use softmax and assume the largest output represents "on". One downside is that you will have to constrain your integers to a maximum size, as you decide the number of classes. The upside is that your model really only can predict integers, like how you wanted.
There could also be more efficient representations in which you use a typical binary number representation of integers, instead of one-hot, and just make sure your model's outputs go from 0.0 to 1.0 without limiting the output to summing to 1.0 as in the case with softmax. Like, maybe a sigmoid activation function and a constant threshold for "on" vs. "off" could work? Let's say any float larger than 0.9 should be an "on"-bit and anything else should be "off". In this case though the model isn't aware of integers like what you requested, but it should still work because the limitation is implicitly learned from your data (given enough data).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
@carlthome this still does not give an integer as the output.
I think the question is :
"Instead of getting a vector of probabilities ,or a one hot vector
How can we get an integer which denotes the final prediction (the predicted class)?"
Even I am stuck with a similar problem:
I have a large amount of classes( 5000 classes ) to predict in my model. So instead of using one hot vector for the Labels (which would need a lot of computer memory, i.e. , if I use one hot vectors for Labels in my 100000 training examples :: 100000 * 5000 , assuming 5000 classes ==> Memory Error).
So I want to have some integer as an output which represents my prediction instead of any one-hot vector or so.
Or rather is there any way such that if my labels were
[1,3,4,2,3]
then I could get
[[0,1,0,0,0], [0,0,0,1,0], [0,0,0,0,1], [0,0,1,0,0], [0,0,0,1,0]]
only when the current batch is being used for training.
@Krishnkant-Swarnkar, you'll want to look into NCE loss and the like.
Thanks @carlthome
I think "sparse_categorical_crossentropy" (Keras) should also do the job.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I am using a Keras NN for a classification problem and the predictions have decimals instead of being integers..
Most helpful comment
The output of softmax Layer is the probability of each class your sample belongs to. So they are float numbers. I guess you want to get what exactly the classes are, then you could use
class_result=np.argmax(result,axis=-1)in the end of your code.