Keras: predict_generator cannot maintain data order

Created on 15 Jan 2017 · 14Comments · Source: keras-team/keras

It seems that predict_generator cannot maintain the data order when using multiprocessing. When feeding into several batches test data into predict_generator, the output array does not correspond to input batch index, which makes us have no clue which output is the prediction of which input, and that makes the function useless. One possible remedy for this might be using priority queue rather than normal queue to maintain the order.

Here is detailed test code.

## mnist_cnn.py in examples
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.models import *
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

batch_size = 128
nb_classes = 10
nb_epoch = 8

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test))


############# Core test code starts here #####################
def generator_from_array(X_test):
        while 1:
                for i in range(100):
                        yield X_test[i:i+1]

print('Predict on batch:')
out = []
for i in range(100):
        out_tmp = model.predict_on_batch(X_test[i:i+1])
        out.append(out_tmp)
print(out[1])
print(out[50])
print(out[-1])
print("Predict generator")
output = model.predict_generator(generator_from_array(X_test), 100, max_q_size=10, nb_worker=4, pickle_safe=True)
print(output.shape)
print(output[1])
print(output[50])
print(output[-1])

And here are results.

Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so.8.0 locally
X_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/8
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 0.405
pciBusID 0000:81:00.0
Total memory: 15.89GiB
Free memory: 15.61GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:81:00.0)
60000/60000 [==============================] - 7s - loss: 0.3829 - acc: 0.8815 - val_loss: 0.0859 - val_acc: 0.9743
Epoch 2/8
60000/60000 [==============================] - 5s - loss: 0.1336 - acc: 0.9603 - val_loss: 0.0606 - val_acc: 0.9806
Epoch 3/8
60000/60000 [==============================] - 5s - loss: 0.1041 - acc: 0.9690 - val_loss: 0.0533 - val_acc: 0.9833
Epoch 4/8
60000/60000 [==============================] - 5s - loss: 0.0861 - acc: 0.9735 - val_loss: 0.0441 - val_acc: 0.9852
Epoch 5/8
60000/60000 [==============================] - 5s - loss: 0.0781 - acc: 0.9763 - val_loss: 0.0409 - val_acc: 0.9861
Epoch 6/8
60000/60000 [==============================] - 5s - loss: 0.0702 - acc: 0.9793 - val_loss: 0.0387 - val_acc: 0.9870
Epoch 7/8
60000/60000 [==============================] - 5s - loss: 0.0626 - acc: 0.9815 - val_loss: 0.0379 - val_acc: 0.9867
Epoch 8/8
60000/60000 [==============================] - 5s - loss: 0.0605 - acc: 0.9817 - val_loss: 0.0352 - val_acc: 0.9891
Predict on batch:
[[  1.61985781e-07   9.81094581e-06   9.99989748e-01   2.69348943e-09
    1.97990360e-10   8.48836210e-11   4.53296529e-08   7.74509276e-11
    2.23150167e-07   2.99653670e-11]]
[[  9.75747753e-06   2.34337261e-09   5.09917042e-09   1.79785129e-08
    6.84200643e-08   6.34509252e-06   9.99983668e-01   4.00663530e-11
    1.21496996e-07   1.95249289e-10]]
[[  9.69054281e-10   1.05993847e-09   1.87508320e-09   1.94809417e-07
    1.49762297e-06   4.11489260e-08   8.54344595e-10   8.08601499e-07
    1.35151751e-07   9.99997377e-01]]
Predict generator
(100, 10)
[  1.61985781e-07   9.81094581e-06   9.99989748e-01   2.69348943e-09
   1.97990360e-10   8.48836210e-11   4.53296529e-08   7.74509276e-11
   2.23150167e-07   2.99653670e-11]
[  9.99998927e-01   6.84537635e-11   4.53024768e-07   9.15579487e-11
   1.19156296e-10   1.37983824e-09   6.24313543e-08   5.71949954e-09
   1.34752597e-07   4.58147241e-07]
[  6.04119035e-04   2.68195297e-08   1.23279997e-05   2.34821496e-10
   9.99363124e-01   1.72202430e-08   1.96394576e-05   6.58836768e-07
   1.14492806e-07   3.96185520e-08]

Source

iammarvelous

👍10

Most helpful comment

@patyork @iammarvelous
Hello there, I just have the same issue that makes me lose quite some time debugging.
I think this should be considered a bug, because if predict_generator() method cannot reconstruct the order then the prediction is not usable and the method is just useless.

I suggest one of the following:

At least have a warning message for user to warn them of the risk using predict_generator with workers >1
Force workers=1 all the time
Somehow keep the batch index in the queue and reconstruct the right order

LamDang on 17 Apr 2017

👍17

All 14 comments

If you want to compare predictions versus known outputs, use an evaluate_generator.

patyork on 15 Jan 2017

👎2

@patyork I think evaluate_generator is more useful for validation. But for the purpose of 'real' predicting, it is important to know my predictions are from which exact input samples, right?

iammarvelous on 15 Jan 2017

👍1

That's a use case, of course. If setting nb_workers=1 won't work for you, due to slower speed, and just loading all of the inputs and calling predict is too much for your memory, you'd probably be better off writing your own generator + predict/predict_on_batch routine such that you can queue the inputs how you'd like, and be able to save the predictions (and a reference to the inputs that created them) on the fly how you'd like (and then unload) to preserve memory.

That's a pretty niche/uncommon issue to need to solve (high speed, large dataset, prediction and saving); most likely too niche for inclusion in the Keras core.

patyork on 15 Jan 2017

👎20

I suggest one of the following:

At least have a warning message for user to warn them of the risk using predict_generator with workers >1
Force workers=1 all the time
Somehow keep the batch index in the queue and reconstruct the right order

LamDang on 17 Apr 2017

👍17

@LamDang @patyork agreed they should force predict_generator and evaluate_generator workers=1.
I have been having the same problem. I need to aggregate the results of predict_generator but can't with workers > 1. Took me a while to figure out. The same problem with fit_generator() when using validation_generator. I want to be able to set workers > 1 for the training but evaluate_generator workers=1 to ensure I get consistent accuracy measure. Can't do it without re-writing lots of code.

I suggest to always force workers=1 for evaluate_generator and predict_generator.

avn3r on 19 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 18 Jul 2017

agreed. This has long worried me. predict_generator appears to be the most efficient way of evaluating a very large collection of images, and keeping the card at full utilization, but this order ambiguity invites errors. Instead of fixing the ordering though, the interface could be changed so that the input to this is a generator that yields (ID, Data) pairs, and the generator could yield (ID, prediction) pairs.

cjmielke on 8 Aug 2017

👍1

I agree, I do not see how to use predict generator,
I just change the mnist_cnn example to get data from a generator
even if I set workers=1 I do not get the same order in predicted value as the input one.

@fchollet this is a bug isn't it ?

(and same thing for evaluate_generator (#6499) I do not get the same results (compare to evaluate) et re-run it will give slitly different results ...)

romainVala on 29 Aug 2017

👍3

May it be that this might be caused due to

# the data, **shuffled** and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In this case https://github.com/rstudio/keras/issues/149 I had a similar problem (although using R Keras) and the reason was that shuffle was set to TRUE when batch importing the images. Therefore, the image order was different for each repetition.

It might be also that I don´t get the point of this issue #5048 because I am a greenhorn in this topic. Then please ignore my posting.

saanasum on 10 Oct 2017

👍2

I recently crashed into predict_generator inconsistency too. Looks like it's a quite old issue.
@romainVala same for me, even if I use workers=1 I don't get the data in the same order. It works when I use workers=0 though. However, doing so forces the generator to execute on the main thread, which is probably not the best idea for bigger datasets (the actual reason we use a generator).
Does anyone know if this is fixed? I've seen some alternatives like #6891 with dataset API, but is there any workarounds/fixes for predict_generator?

glrs on 17 Dec 2018

Has there been a fix on this please? I have the exact same issue, as I am trying to create bottlenecks using a model with the top off. I need to store the bottlenecks in separate files on disk, but without knowing which input file produced which output bottleneck, the whole thing would end up a mess. Or does anyone know a better way to create bottleneck files for individual inputs?

rahullak on 15 Feb 2019

I bumped into this issue when I used the same generator for both evaluate_generator and predict_generator in that exact order. Re-initialising the generator or simply calling predict_generator first and evaluate_generator second solved it for me (although it cost me an hour to figure that out).

Attila94 on 5 Mar 2019

I had similar issue where the predictions got slightly worse each time I called predict_generator. Setting workers = 0 somehow helped.

predictions = model.predict_generator(testGenerator, steps = np.ceil(testGenerator.samples / testGenerator.batch_size), verbose=1, workers=0)

Maggus503 on 4 May 2019

👍1

When loading your test_set using .flow_from_dataframe/directory make sure to disable the shuffle option; shuffle=None, that will force it to have the original ordering.