Keras: Keras accuracy is not increasing over 50%

Created on 19 May 2017  路  14Comments  路  Source: keras-team/keras

I am trying to build a binary classification algorithm (output is 0 or 1) on a dataset that contains normal and malicious network packets. The dataset shape (after converting IP @'s and hexa to decimal) is:

capture1

Note: The final column is the output.

And the Keras model is:

from keras.models import Sequential
from keras.layers import Dense
from sklearn import preprocessing
import numpy
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
seed = 4
numpy.random.seed(seed)

dataset = numpy.loadtxt("NetworkPackets.csv", delimiter=",")
X = dataset[:, 0:11].astype(float)
Y = dataset[:, 11]

model = Sequential()
model.add(Dense(12, input_dim=11, kernel_initializer='normal', activation='relu'))
model.add(Dense(12, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal', activation='relu'))

model.compile(loss='binary_crossentropy', optimizer='Adam', metrics=['accuracy'])
model.fit(X, Y, nb_epoch=100, batch_size=5)

scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))

However, I tried different optimizers, activation functions, number of layers, but the accuracy is reaching 0.5 at most:

capture

Even I tried Grid search for searching the best parameters, but the maximum is 0.5. Does anyone knows why the output is always like that? and how can I enhance it. Thanks in advance!

Most helpful comment

@Ahmid you have to use the same transformer that you fitted with the training data.

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(train_packet)
# <train here>

X_test_scaled = scaler.transform(test_packet)
preds = loaded_model.predict(X_test_scaled)

All 14 comments

  1. You need to take care of input numerical scale. Try to normalize every feature dimension into [-1, 1] or [0, 1].
  2. Maybe some feature are categorical but not scalar, you may need to study how to deal with these kind of feature.
  3. If your data is not in a large scale, I will suggest you to use xgboost model.

The data has also to be standardized:
(x - x_mean) / x_std

Please ask questions on stackoverflow. We have here so many issues; many of them still open.

Your issue is having a RELU activation in the last layer. Use sigmoid!

@joelthchao Do you mean that the inputs must be normalized before using them in the model? And if yes do you know a method in keras for doing that?

@StefanoD I used standardized_X = preprocessing.scale(X) and the result becomes:

capture

Which is great!
But the question is that a right approach?
Because when I predicted if a packet is normal or malicious after training, the model predicts a wrong one :/

@myhussien I tried using that and the result becomes 0%

Your test data has also to be standardized before prediction.

@StefanoD when I am standardizing the data before prediction (See below), the output is all [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]].

capture

@Ahmid you have to use the same transformer that you fitted with the training data.

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(train_packet)
# <train here>

X_test_scaled = scaler.transform(test_packet)
preds = loaded_model.predict(X_test_scaled)

@avsolatorio Thank you! Solved my problem 馃憤

The data has also to be standardized:
(x - x_mean) / x_std

Please ask questions on stackoverflow. We have here so many issues; many of them still open.

Does the data has to be standardized , normalized or both ?

Does the data has to be standardized , normalized or both ?

Where's the difference between standardized and normalized?

Does the data has to be standardized , normalized or both ?

Where's the difference between standardized and normalized?

In case of standardization we use the formula: _(x - mean) / standard_deviation_
While in case of normalization we use the formula : _(x - xmin) / (xmax - xmin)_

In case of standardization we use the formula: _(x - mean) / standard_deviation_
While in case of normalization we use the formula : _(x - xmin) / (xmax - xmin)_

I'm not sure, if there is a big difference, because the goals are similar, see Normalization Wikipedia.

And I wouldn't mix different methods, if I were you. I don't see the point. But this is just an opinion.
I would just test what works best.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

zygmuntz picture zygmuntz  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

Imorton-zd picture Imorton-zd  路  3Comments

NancyZxll picture NancyZxll  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments