Keras: Saving extra objects in .hdf5 files

Created on 10 Aug 2018  路  4Comments  路  Source: keras-team/keras

Time and again throughout my deep learning career I had to implement ways of saving extra information along with the model file. This information can be really simple, like some metadata about the model (when it was trained, dataset version, output labels names etc.) or a bit more complex, like vocabulary info for NLP models.

My latest attempt of making this task simple yielded this so-called ModelExtension framework:
(also includes an example that saves character vocabulary info with the model)
https://gist.github.com/fredtcaroli/0a0e570bbe0f8abe8f8a848fe479e758

This sure helps a lot... Now I don't have to go through several hoops to get model extensions, I can simply:

vocab = CharactersVocabulary()
vocab.fit(...)
model = ExtendedModel(inp, outp, extensions={'vocab': vocab})
model.fit(...)
model.save('file.hdf5')

loaded_model = load_model('file.hdf5')
loaded_model.extensions['vocab']  # CharactersVocabulary instance

This is still not perfect... It has some drawbacks. So what I would really want is an official way of doing just that: saving extra objects with the model.
I think it would save everyone a whole lot of time and it would be a great addition!

So let me know what you guys think or if there's a really simple way of doing it that I haven't come across

Most helpful comment

Not sure if it can help, but since the model is usually saved into an HDF5 file, you can use PyTables library to add metadata.
I did some quick tests and I think it does not interfere with the model itself:

import tables as tb

fname = './model.h5'
# Load model and add some metadata
with tb.open_file(fname, 'a') as h5_mod:
    node = h5_mod.get_node('/')
    node._v_attrs['test_meta'] = 'My test attibute {}'.format(3.1415)

# Load model, now with metadata
with tb.open_file(fname, 'r') as h5_mod:
    node = h5_mod.get_node('/')
    # Check if it worked
    print('test_meta:', node._v_attrs['test_meta'])

All 4 comments

Not sure if it can help, but since the model is usually saved into an HDF5 file, you can use PyTables library to add metadata.
I did some quick tests and I think it does not interfere with the model itself:

import tables as tb

fname = './model.h5'
# Load model and add some metadata
with tb.open_file(fname, 'a') as h5_mod:
    node = h5_mod.get_node('/')
    node._v_attrs['test_meta'] = 'My test attibute {}'.format(3.1415)

# Load model, now with metadata
with tb.open_file(fname, 'r') as h5_mod:
    node = h5_mod.get_node('/')
    # Check if it worked
    print('test_meta:', node._v_attrs['test_meta'])

@iipr thanks! Your snippet is working great for me

@alex9311 after my previous comment I started using it more often, and I find it quite useful for reproducibility. I save stuff like number of epochs used to train the model, optimiser, lr...

Btw, if you need to check this attributes in a non-programming way, you can open the .h5 with ViTables.

yes exactly! It seems to be a great way to store all meta data related to a model in one place

Was this page helpful?
0 / 5 - 0 ratings

Related issues

harishkrishnav picture harishkrishnav  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

nryant picture nryant  路  3Comments

Imorton-zd picture Imorton-zd  路  3Comments

somewacko picture somewacko  路  3Comments