I want to try a Keras model in AWS EC2 instance with dataset in AWS S3.
However, I cannot directly use flow_from_directory pointing to S3 url. Is there an alternative way to do it?
Thanks!
You need to download the files first. It would be way too slow to read directly from S3.
Thanks!
We used TensorFlow's feature to form training batches from distributed files and fed these batches to Keras through the generator capability.
How do you do that? Could you provide a code sample?
S3 latency is actually fairly low and if you have multiple processes fetching data it is not a problem (we've done this in other contexts).
We had experimented with this over a year ago, so I misremembered slightly in that TensorFlow was not directly doing the S3 fetches. We experimented with presenting S3 files as a file system (like https://fullstacknotes.com/mount-aws-s3-bucket-to-ubuntu-file-system/). Our data size was small enough to fit on an EBS volume, though, so ultimately we went that route to remove a piece in the architecture. Keep in mind that EBS reads are also going over the network just like S3 reads.
🙏🏽 jeremy
On Sat, Jan 7, 2017 at 10:44 AM Jeremy Heffner notifications@github.com
wrote:
S3 latency is actually fairly low and if you have multiple processes
fetching data it is not a problem (we've done this in other contexts).We had experimented with this over a year ago, so I misremembered slightly
in that TensorFlow was not directly doing the S3 fetches. We experimented
with presenting S3 files as a file system (like
https://fullstacknotes.com/mount-aws-s3-bucket-to-ubuntu-file-system/).
Our data size was small enough to fit on an EBS volume, though, so
ultimately we went that route to remove a piece in the architecture. Keep
in mind that EBS reads are also going over the network just like S3 reads.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/4913#issuecomment-271101828,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ACWGWHbtWOXfkRnq5aV60SVADI1uk5ewks5rP90KgaJpZM4La2w-
.
Most helpful comment
S3 latency is actually fairly low and if you have multiple processes fetching data it is not a problem (we've done this in other contexts).
We had experimented with this over a year ago, so I misremembered slightly in that TensorFlow was not directly doing the S3 fetches. We experimented with presenting S3 files as a file system (like https://fullstacknotes.com/mount-aws-s3-bucket-to-ubuntu-file-system/). Our data size was small enough to fit on an EBS volume, though, so ultimately we went that route to remove a piece in the architecture. Keep in mind that EBS reads are also going over the network just like S3 reads.