Keras: Confused about ImageDataGenerator

Created on 7 Apr 2016 · 18Comments · Source: keras-team/keras

Using data augmentation can augment data if there isn't enough data. I see ImageDataGenerator is achievable, but the document is not very detailed. I run the example cifar10_cnn.py, and I'm also not clear about how to use it. Beside, I don't see any different between using ImageDataGenerator and not. Who can tell how to use it. Thank you very much!!

stale

Source

dsy412610

All 18 comments

You generate new training samples by randomly distorting your original images, e.g. by applying translation, rotation, etc. Depending on what kind of variations you want, you can set the parameters. The benefit is that you increase your training dataset and cover more variance of the data distribution, which should help to generalize better.

rpinsler on 7 Apr 2016

👍1

Keras has 2 fit options, one is model.fit that gets conventional numpy arrays as input and the other is model.fit_generator that gets python generators as input. Check it out what generators are if you are not familiar. After that, go back to ImageDataGenerator and see how it is just a wrapper classes around the _flow_generator generator.

EderSantana on 8 Apr 2016

👍1

Thanks for your answer! But there is an MemoryError when I try to use I ImageDataGenerator and my 8GB CPU ram seems to be to small, or my CPU usage is not efficient. So what should I do?

dsy412610 on 8 Apr 2016

i don't know how much data you trying to load at once. If your batch size is small enough, it should work.
This is what generators are all about, they don't create all the data before hand, they do that only when you call it. So, I'd assume that generating a single batch with your configurations consumes way too much memory.

So yeah, what is your batch size, how many workers are you using?

EderSantana on 8 Apr 2016

I used 13000 pictures sized 227*227 and my batch size is 128. I set batch size to 32 and even smaller then, but the error is still there. Besides, I used the default nb_workers value 1.

dsy412610 on 8 Apr 2016

@EderSantana Thanks a lot for your help!

dsy412610 on 9 Apr 2016

👍1

sure! let me know if you need anything else. If not, please close the issue!

EderSantana on 10 Apr 2016

I want to know how should I get the fixed generator method for I saw it was on keras_1 branch.

dsy412610 on 10 Apr 2016

if those fixes work for you, I think you will have to modify your code. Here are the diffs:
https://github.com/fchollet/keras/pull/2152/commits/30989dc997afcfe7097692e75ac5ff9e7ab06e55

EderSantana on 10 Apr 2016

Thanks again!

dsy412610 on 10 Apr 2016

I tried to modify my code, but I find out that the code in keras_1 and master branch is quite different. So I just modified fit_generator and it's useless. And another problem, I changed datagen.fit(X_train) into datagen.fit(X_train, augment = True), the error as following:
Traceback (most recent call last):
File "cifar10_cnn.py", line 108, in
datagen.fit(X_train, augment=True)
File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 288, in fit
img = self.random_transform(img)
File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 253, in random_transform
x = random_shift(x, self.width_shift_range, self.height_shift_range)
File "/usr/local/lib/python2.7/dist-packages/keras/preprocessing/image.py", line 33, in random_shift
shift_x = np.random.uniform(-wrg, wrg) * x.shape[2]
File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 528, in getattr
raise AttributeError(name)
AttributeError: shape

dsy412610 on 10 Apr 2016

@EderSantana

dsy412610 on 11 Apr 2016

that is weird... maybe the change made x not be a numpy array? It says that x.shape cannot be sliced. It would be possible with a regular numpy array.

Was the original code working with Cifar-10? problem with both Theano and tensorflow backends?

If nothing work, try creating a new environment and installing keras-0.3.2 If not even that works, maybe something weird is happening somewhere else besides keras. Unless the datagen was pushed with a bug since always.

@fchollet this generator is working on 0.3.3 right?

EderSantana on 11 Apr 2016

Yeah, the original code worked well with cifar-10 and I just used Theano. Then I tried to modify fit_generator in models.py and fit on my own data sized (227*227), but the memory error remained.

dsy412610 on 11 Apr 2016

We just rewrote the ImageDataGenerator in #2446 .
Can you check if your issue is solved with this?

To help debug the problem can you please show your ImageDataGenerator initialisation and fit snippet?

chsasank on 27 Apr 2016

Yeah, I have solved the problem but not with the rewrote ImageDataGenerator. And I raised an issue some days ago #2318 .

dsy412610 on 28 Apr 2016

Does ImageDataGenerator change class labels as well? If not, is there a way to change class labels?
Or in other words, which line in class ImageDataGenerator does the implementation make sure that the output of ImageDataGenerator is a tuple (inputs, targets) as requested in fit_generator?
Thanks in advance