Keras: ImageDataGenerator from url or filepath list

Created on 26 Apr 2018 · 3Comments · Source: keras-team/keras

I often read data from a url list or disk file path list

like this ...
https://www.kaggle.com/c/landmark-recognition-challenge/data
2018-04-26 11 20 25

Practically, there are many cases where the directory structure is not classified by class but only the file path exists in the DB

How about adding an API like flow_from_url_list or flow_from_filepath_list to the ImageDataGenerator?

Source

junwoopark92

👍2

Most helpful comment

@vkk800
Thank you for your feedback.

First of all, as you say, training through url is a big bottleneck, so I did not mean to make that API for training. The keras model has a predict_generator api as well as a fit_generator api.

For example, when you have a simple dog and cat classification model, you might want to take an external image and classify it into two groups. Should I use external images on my disk to make predictions?

What you want to know is that if you are mapping the url and model predictions, not the image itself, then there is no reason to write the image to disk. Read it into memory, predict it, and get it out of memory.

However, you may think that the above case is too minor a case, so you do not need a url generator.

Is flow_from_filepath_list an argumentless api that takes a list of file paths that are not url lists?

junwoopark92 on 27 Apr 2018

👍3

All 3 comments

Do you think it is a good idea read data from an url during training time? Typically the delay introduced by downloading from some server is so long that I would imagine it becomes a major bottleneck.

vkk800 on 26 Apr 2018

@vkk800
Thank you for your feedback.

First of all, as you say, training through url is a big bottleneck, so I did not mean to make that API for training. The keras model has a predict_generator api as well as a fit_generator api.

However, you may think that the above case is too minor a case, so you do not need a url generator.

Is flow_from_filepath_list an argumentless api that takes a list of file paths that are not url lists?

junwoopark92 on 27 Apr 2018

👍3

Minor Point: Use Case is simple, having too big datasets to store locally. Furthermore, using multithreading and enough workers can get you the data if network is fast enough, say between google compute machine and google storage.