I'm trying to classify images into 10 classes. To get probabilities for images, I'm using model.predict_generator() function in keras. This returns only prediction values and not the corresponding sample ID(In this case image file name).
test_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = test_datagen.flow_from_directory(
'/path/',
target_size=(256, 256),
batch_size=32,`
class_mode='categorical')
predictions = model.predict_generator(validation_generator, val_samples=10000)
How do I find the corresponding image name/id of the predictions?
(OR)
In what order does the '.flow_from_directory' read the samples?
Does the 'batch_size' and 'val_samples' arguments affect the order of predictions?
Click here for code
Try validation_generator.class_indices
and validation_generator.classes
. pprint it and see how its useful to you.
@Vivek-B Were you able to resolve the issue? I am also facing a similar problem.
@marcj Do those functions exist in Keras? I tried searching for those, but did not find any matching search results.
@raghuramdr Yes. marcj's suggestion was useful.
Type 'validation_generator.' and then tab button. You'll get all the available functions.
@Vivek-B Thanks for the answer. How do you find the order in which flow_from_directory take the images. I am asking this because the accuracy computed from predict_generator and that obtained from a confusion matrix do not match.
@raghuramdr I suppose that the order like in generator.filenames list because it is used for iteration in next() method.
Exactly. As @Ershov-Alexander said, generator.filenames
solve this problem!
i wish there was an example in the documentation for how to get test data into a csv file
@pengpaiSH @Vivek-B just to confirm:
Doing
predictions = model.predict_generator(validation_generator, val_samples=total_samples)
will go trough the whole dataset once and only once (assuming there are total_samples
in the folder). To each prediction the 'real class' is validation_generator.classes
.
I want to use this information to build a confusion matrix
I have exactly the same problem:
my generator.class_indices are:
{'classXX': 4, 'classXX': 6, 'classXX': 8, 'classXX': 3, 'classXX': 1, 'classXX': 7, 'classXX': 5, 'classXX': 2, 'classXX': 0}
classXX - are all different from each other
But when i predict a single image (even from training or validation set) my predictions are not corresponding to classes in class_indices
model.predict(x) returns some data, that do not correlate to class_indices above
Something that helped me, was to use a new generator each time. So doing
gen = image.ImageDataGenerator(shuffle=False, ...).flow_from_directory(...)
preds = model.predict_generator(gen, len(gen.filenames)
it works as expected, but if then I use the same gen
again for some other task it seems it's shuffling the images (even if shuffle=False
).
@hdmetor ohhh thanks for your information!
This do help me a lot...
I always got different result when I was using evaluate_generator and predict_generator.
Now I create image data generator again before using predict_generator, they have the same result finally..(Not totally the same but make more sense)
Guys I had a question about how the function validation_generator makes the labels..... what is actually happening under the hood.
gen = image.ImageDataGenerator(shuffle=False, ...).flow_from_directory(...)
preds = model.predict_generator(gen, len(gen.filenames)
This worked for me. I set up a test data directory with class folders and the test images in them. Although if I use model.predict on a single image I get totally different predictions.
Any ideas?
Just another comment, using keras 2.0.6 in a kaggle competition I see the same issue with the predict order.
If I use the code below to generate predictions, I get correct predictions the first time I call it, and apparently shuffled predictions on the second call:
> `test_gen = validgen.flow_from_directory(
> test_data_dir,
> target_size=(img_height, img_width),
> batch_size=1,
> class_mode='binary',
> seed=0,
> shuffle=False)
>
>
> preds = model_final.predict_generator(test_gen, 1531)
> print (preds[0:10])
> print (test_gen.filenames[0:10])
>
> pred1 = model_final.predict_generator(test_gen, 1531)
> print (pred1[0:10])
> print (test_gen.filenames[0:10])`
``
Found 1531 images belonging to 1 classes.
[[ 1.00000000e+00]
[ 1.81016767e-05]
[ 9.99988794e-01]
[ 8.15555453e-03]
[ 3.15029087e-04]
[ 6.08354285e-02]
[ 5.00569877e-04]
[ 6.83458569e-03]
[ 9.88739491e-01]
[ 2.21080336e-04]]
['x/1305.jpg', 'x/570.jpg', 'x/508.jpg', 'x/1076.jpg', 'x/624.jpg', 'x/94.jpg', 'x/1128.jpg', 'x/137.jpg', 'x/795.jpg', 'x/555.jpg']
[[ 1.81016931e-05]
[ 9.99988794e-01]
[ 8.15555453e-03]
[ 3.15029087e-04]
[ 6.08354136e-02]
[ 5.00570342e-04]
[ 6.83457917e-03]
[ 9.88739491e-01]
[ 2.21080118e-04]
[ 1.81115605e-02]]
['x/1305.jpg', 'x/570.jpg', 'x/508.jpg', 'x/1076.jpg', 'x/624.jpg', 'x/94.jpg', 'x/1128.jpg', 'x/137.jpg', 'x/795.jpg',
``'x/555.jpg']
@brad0taylor It seems the generator.filename is always the same......but generator's output is not always ordered as that... It's so annoying.... so is there an explanation???????????????? @fchollet
Another thing I found is that yourpred1
array is shifted by one element of preds
array?
It may mean something? maybe?
From @chenchennn 's comment maybe you should create the generator test_gen
again before using it in anotherpredict_generator
Anyone found a usable solution to this? At least I know that I am not crazy :D
gen = image.ImageDataGenerator(shuffle=False, ...).flow_from_directory(...) - helped me, thanks for posting @hdmetor
no problem @XRarach
I think there is still no official support for that. another thing I did in the past (which is NOT efficient) is to grab the file path from the generator and then pass each one through the network. Obviously will work only for reasonable small datasets.
The listed filenames are fixed.
https://github.com/fchollet/keras/blob/22e6bea8c2e23c6bbd6d98b4d3fe8b2e74c33c3d/keras/preprocessing/image.py#L1008
https://github.com/fchollet/keras/blob/22e6bea8c2e23c6bbd6d98b4d3fe8b2e74c33c3d/keras/preprocessing/image.py#L1004
https://github.com/fchollet/keras/blob/22e6bea8c2e23c6bbd6d98b4d3fe8b2e74c33c3d/keras/preprocessing/image.py#L859
The enumerated filenames are randomly shuffled.
https://github.com/fchollet/keras/blob/22e6bea8c2e23c6bbd6d98b4d3fe8b2e74c33c3d/keras/preprocessing/image.py#L1023
https://github.com/fchollet/keras/blob/22e6bea8c2e23c6bbd6d98b4d3fe8b2e74c33c3d/keras/preprocessing/image.py#L704
https://github.com/fchollet/keras/blob/22e6bea8c2e23c6bbd6d98b4d3fe8b2e74c33c3d/keras/preprocessing/image.py#L709
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
About this issue, what about overwritting the "filenames" parameter, right after building the batches ?
From this:
to something like this
filenames = []
for i, j in enumerate(index_array):
fname = self.filenames[j]
filenames.append(fname)
img = load_img(os.path.join(self.directory, fname),
grayscale=grayscale,
target_size=self.target_size,
interpolation=self.interpolation)
x = img_to_array(img, data_format=self.data_format)
x = self.image_data_generator.random_transform(x)
x = self.image_data_generator.standardize(x)
batch_x[i] = x
self.filenames = filenames
What do you think?
This thread has several topics, but I am having a similar problem as some of you.
TLDR: Simply call datagen.reset()
before you call model.predict_generator()
to get the same order as datagen.filenames
and datagen.classes
.
Explanation: I am using flow_from_directory()
on my data-generator and I need to know which image is associated with each prediction. The problem is that the data-generator will output batches for eternity so even if it is not shuffled, it might start at different positions in the dataset each time we use the generator, thus giving us seemingly different outputs. So you would need to exactly match the number of iterations and batches to the dataset-size, in order for it to restart at the beginning of the dataset in each call to predict_generator()
. You can see the internal batch-counter using datagen.batch_index
.
Somebody suggested to create a new data-generator before each call to predict_generator()
. That should work, but it is a hack-around and there is a better and simpler way, which is to call reset()
on the generator, and if you have set shuffle=False
then it should start over from the beginning of the dataset and give the exact same output each time, so that the ordering now matches datagen.filenames
and datagen.classes
. This solved the problem for me.
The last suggestion about updating the list of filenames would be a major side-effect and it should probably be avoided. If you need something like that and datagen.reset()
is not enough for you, then add another variable to the datagen-object, but don't overwrite the existing list of filenames.
Just another comment, using keras 2.0.6 in a kaggle competition I see the same issue with the predict order.
If I use the code below to generate predictions, I get correct predictions the first time I call it, and apparently shuffled predictions on the second call:
> `test_gen = validgen.flow_from_directory( > test_data_dir, > target_size=(img_height, img_width), > batch_size=1, > class_mode='binary', > seed=0, > shuffle=False) > > > preds = model_final.predict_generator(test_gen, 1531) > print (preds[0:10]) > print (test_gen.filenames[0:10]) > > pred1 = model_final.predict_generator(test_gen, 1531) > print (pred1[0:10]) > print (test_gen.filenames[0:10])`
``
Found 1531 images belonging to 1 classes.
[[ 1.00000000e+00]
[ 1.81016767e-05]
[ 9.99988794e-01]
[ 8.15555453e-03]
[ 3.15029087e-04]
[ 6.08354285e-02]
[ 5.00569877e-04]
[ 6.83458569e-03]
[ 9.88739491e-01]
[ 2.21080336e-04]]
['x/1305.jpg', 'x/570.jpg', 'x/508.jpg', 'x/1076.jpg', 'x/624.jpg', 'x/94.jpg', 'x/1128.jpg', 'x/137.jpg', 'x/795.jpg', 'x/555.jpg'][[ 1.81016931e-05]
[ 9.99988794e-01]
[ 8.15555453e-03]
[ 3.15029087e-04]
[ 6.08354136e-02]
[ 5.00570342e-04]
[ 6.83457917e-03]
[ 9.88739491e-01]
[ 2.21080118e-04]
[ 1.81115605e-02]]
['x/1305.jpg', 'x/570.jpg', 'x/508.jpg', 'x/1076.jpg', 'x/624.jpg', 'x/94.jpg', 'x/1128.jpg', 'x/137.jpg', 'x/795.jpg',
``'x/555.jpg']
why print "`Found 1531 images belonging to 1 classes."
Try
validation_generator.class_indices
andvalidation_generator.classes
. pprint it and see how its useful to you.
Could you please give link of some example of a classification problem with more than two classes where these functions are used
@raghuramdr Yes. marcj's suggestion was useful.
Type 'validation_generator.' and then tab button. You'll get all the available functions.
Could you please give link of some example of a classification problem with more than two classes where these functions are used,with dataset used.I am trying to learn classification with more than two classes.I have a dataset with 100 classes in it each containing 10 images,need to know how can i apply a cnn on it.
Try
validation_generator.class_indices
andvalidation_generator.classes
. pprint it and see how its useful to you.
It worked for me, I wanted to see which class was assigned to which folder like this:
print(validation_generator.class_indices)
Following could be done (for 2 classes) along with shuffle=False, batch_size = 32:
print("Number of Classes: ", training_set.num_classes)
print(training_set.class_indices)
cats_indx = np.where(training_set.labels == training_set.class_indices['cats'])[0][0]
dogs_indx = np.where(training_set.labels == training_set.class_indices['dogs'])[0][0]
print("First Index of Cat: ", cats_indx)
print("First Index of Dog: ", dogs_indx)
# print("This is the 6th (indx = 5) batch: ", training_set[5][1])
# finding the actual categorical label of our dogs
cat_batch_num = cats_indx//32
relative_index_of__first_cat = cat_batch_num % 32
cat_label = training_set[cat_batch_num][1][relative_index_of__first_cat]
# finding the actual categorical label of our dogs
dog_batch_num = dogs_indx//32
relative_index_of__first_dog = dog_batch_num % 32
dog_label = training_set[dog_batch_num][1][relative_index_of__first_dog]
print("Cats = ", cat_label)
print("Dogs = ", dog_label)
Most helpful comment
Try
validation_generator.class_indices
andvalidation_generator.classes
. pprint it and see how its useful to you.