Keras: ImageDataGenerator from url or filepath list

Created on 26 Apr 2018  路  3Comments  路  Source: keras-team/keras

I often read data from a url list or disk file path list

like this ...
https://www.kaggle.com/c/landmark-recognition-challenge/data
2018-04-26 11 20 25

Practically, there are many cases where the directory structure is not classified by class but only the file path exists in the DB

How about adding an API like flow_from_url_list or flow_from_filepath_list to the ImageDataGenerator?

Most helpful comment

@vkk800
Thank you for your feedback.

First of all, as you say, training through url is a big bottleneck, so I did not mean to make that API for training. The keras model has a predict_generator api as well as a fit_generator api.

For example, when you have a simple dog and cat classification model, you might want to take an external image and classify it into two groups. Should I use external images on my disk to make predictions?

What you want to know is that if you are mapping the url and model predictions, not the image itself, then there is no reason to write the image to disk. Read it into memory, predict it, and get it out of memory.

However, you may think that the above case is too minor a case, so you do not need a url generator.

Is flow_from_filepath_list an argumentless api that takes a list of file paths that are not url lists?

All 3 comments

Do you think it is a good idea read data from an url during training time? Typically the delay introduced by downloading from some server is so long that I would imagine it becomes a major bottleneck.

@vkk800
Thank you for your feedback.

First of all, as you say, training through url is a big bottleneck, so I did not mean to make that API for training. The keras model has a predict_generator api as well as a fit_generator api.

For example, when you have a simple dog and cat classification model, you might want to take an external image and classify it into two groups. Should I use external images on my disk to make predictions?

What you want to know is that if you are mapping the url and model predictions, not the image itself, then there is no reason to write the image to disk. Read it into memory, predict it, and get it out of memory.

However, you may think that the above case is too minor a case, so you do not need a url generator.

Is flow_from_filepath_list an argumentless api that takes a list of file paths that are not url lists?

Minor Point: Use Case is simple, having too big datasets to store locally. Furthermore, using multithreading and enough workers can get you the data if network is fast enough, say between google compute machine and google storage.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Imorton-zd picture Imorton-zd  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

anjishnu picture anjishnu  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

zygmuntz picture zygmuntz  路  3Comments