Keras: FeatureRequest: Save/Load model to/from iostream

Created on 9 Feb 2018  路  7Comments  路  Source: keras-team/keras

Request

change the model.save() / load() API so that models can be saved/loaded from an iostream/file/string-objects instead of only to files.

Issue in detail

model.save/load() operate on file-level. Trying to save the model in parts, aka splitting up model-arch and model-weights, also does not help: model.to_json/to_ymal/from_json/from_ymal() work operate on string-level, while the model.save_weights/load_weights() again operate on file-level again (compare https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model ). This is very inconsistent and there seems currently no way to nicely pack up a model and ship it (in session) somewhere else, without having to involve the file-system. Shouldn't there be a simpler way to completely represent the model in a self-contained compact py-object, which can be shipped directly, comparable to the mechanism of pickle.dumps(obj) -> str; pickle.loads(str) -> obj

Some more info

Hej and first thank you for providing this great library. I have currently reached a standstill in my development process with Keras because of some limitations in the model-persistence API.

First out I start with a trained Keras-model. Even though not recommended, I tried to naively pickle that thing down, which failed for me because of an 'non-picklable' thread-lock (I am not sure actually if this is not some unreleased resource by Keras, as the model was finished training, so I did not expect this behaviour; maybe a seperate Issue?).
Going back to the documentation https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model did also not help much , as by the standard way of model saving I am required in some way or another to save my model as an entity to file. This is kind of problematic, as my model is part of a bigger object, which, by itself, would be easy to serialize and ship with the pickle-module if it was not for Keras.
My idea to work around this by writing a customized serialization function for the wrapper will also not work, as I cannot dump my keras-model within the session as any pyobject, only to file. And packing up a file resource with the rest of serialization information is kind of cumbersome.
So why does Keras not provide a simple string or io-stream dump method for both model architecture & weights together and so make their models serializable for all purposes.

Most helpful comment

Hi @andhus,

If you don't mind somewhat hacky code, after a lot of hacking around and digging in the source code, @mikeyshulman and I got the following to work:

import contextlib
import h5py

file_access_property_list = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
file_access_property_list.set_fapl_core(backing_store=False)
file_access_property_list.set_file_image(existing_binary_data)  # fill in your data

file_id_args = {
    'fapl': file_access_property_list,
    'flags': h5py.h5f.ACC_RDONLY,
    'name': b'this should never matter',
}

h5_file_args = {
    'backing_store': False,
    'driver': 'core',
    'mode': 'r',
}

with contextlib.closing(h5py.h5f.open(**file_id_args)) as file_id:
    with h5py.File(file_id, **h5_file_args) as h5_file:
        your_function_that_loads_from_h5_file(h5_file)  # implement this function

This allows you to create an in-memory-only h5py.File object, because of the combination of 'driver': 'core' and 'backing_store': False. Here are the docs with more detail:

http://docs.h5py.org/en/latest/faq.html#what-file-drivers-are-available
http://docs.h5py.org/en/latest/high/file.html#file-drivers

All 7 comments

Hi @mzoll, I opened #9789 to resolve this issue -- hopefully the maintainers are able to merge it soon!

Hi guys, a question about saving a model file to a stream, currently I'm struggling with saving my model to Google Cloud Storage. Would it be possible we could save the model into an FileIo object?

Hi @garymabin, unfortunately that would be difficult to do since keras uses the h5py library, whose h5py.File object is implemented in C and does not support being overlaid on top of a python file-like object.

However, #9789 got merged, so in the next release of keras you'll be able to save to a stream like this: https://github.com/keras-team/keras/pull/9789/files#diff-dc98997af8382fa820745ca633451088R139

@obi1kenobi Thanks for the explanation!

Hi @garymabin @obi1kenobi The PR above should make saving/loading from GCS seamless - but it still requires writing/reading to/from local file system under the hood.

As per @obi1kenobi's pointer you can get out the binary data from an in-memory h5py.File this way. However I found no way to set binary data to a h5py.File in memory(?) which is needed in loading. It can apparently be done in tables, see last answer here: https://stackoverflow.com/questions/16654251/can-h5py-load-a-file-from-a-byte-array-in-memory.

import urllib.request
import tables
url = 'https://s3.amazonaws.com/<your bucket>/data.hdf5'
response = urllib.request.urlopen(url) 
h5file = tables.open_file("data-sample.h5", driver="H5FD_CORE",
                          driver_core_image=response.read(),
                          driver_core_backing_store=0)

Here it says that opening an h5py file on BytesIO() will be supported in h5py 2.9: http://docs.h5py.org/en/latest/high/file.html#python-file-like-objects

"""Create an HDF5 file in memory and retrieve the raw bytes

This could be used, for instance, in a server producing small HDF5
files on demand.
"""
import io
import h5py

bio = io.BytesIO()
with h5py.File(bio) as f:
    f['dataset'] = range(10)

data = bio.getvalue() # data is a regular Python bytes object.
print("Total size:", len(data))
print("First bytes:", data[:10])

But still not clear if this supports initializing with existing binary data, or?

Hi @andhus,

If you don't mind somewhat hacky code, after a lot of hacking around and digging in the source code, @mikeyshulman and I got the following to work:

import contextlib
import h5py

file_access_property_list = h5py.h5p.create(h5py.h5p.FILE_ACCESS)
file_access_property_list.set_fapl_core(backing_store=False)
file_access_property_list.set_file_image(existing_binary_data)  # fill in your data

file_id_args = {
    'fapl': file_access_property_list,
    'flags': h5py.h5f.ACC_RDONLY,
    'name': b'this should never matter',
}

h5_file_args = {
    'backing_store': False,
    'driver': 'core',
    'mode': 'r',
}

with contextlib.closing(h5py.h5f.open(**file_id_args)) as file_id:
    with h5py.File(file_id, **h5_file_args) as h5_file:
        your_function_that_loads_from_h5_file(h5_file)  # implement this function

This allows you to create an in-memory-only h5py.File object, because of the combination of 'driver': 'core' and 'backing_store': False. Here are the docs with more detail:

http://docs.h5py.org/en/latest/faq.html#what-file-drivers-are-available
http://docs.h5py.org/en/latest/high/file.html#file-drivers

@obi1kenobi Perfect, thanks! If you don't mind I'll incorporate this in the mentioned PR.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Imorton-zd picture Imorton-zd  路  3Comments

MarkVdBergh picture MarkVdBergh  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments