It would be nice if streaming HDF5 (which is required in out-of-core situations) would be implemented in Tensorflow.
This feature request is very broad, and we will likely not work on it in the foreseeable future. To keep the issue tracker focused, I will close this issue.
Well, what I'm actually asking for is something along the lines of a tf.TextLineReader
that supports both streaming / random access. The request came up before e.g. in #2089 . The problem with always closing these feature requests is that people who are looking for easy, new contributions might not see them, although they might be a good first step into the TF code base.
+1. For reference, in https://www.tensorflow.org/api_guides/python/reading_data, the file format supported are only csv, binary and tfrecord. But hdf5 is a pretty common format. For big datasets, it is not possible to load a whole dataset with format .hdf5 once like this example. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/hdf5_classification.py. Instead, we use small hdf5 files for each sample.
The only feasible way to deal with this is to transfer hdf5 file to tfrecord or binary file first.
Most helpful comment
Well, what I'm actually asking for is something along the lines of a
tf.TextLineReader
that supports both streaming / random access. The request came up before e.g. in #2089 . The problem with always closing these feature requests is that people who are looking for easy, new contributions might not see them, although they might be a good first step into the TF code base.