Hello! I'm writing on behalf of the ml5 project. In a meeting today with @cvalenzuela and other contributors we discussed some questions around the saving and loading features for ml5. This is discussed in more detail at https://github.com/ml5js/ml5-library/issues/174.
We are wrapping the tf.js save() method in our ImageClassifier class for use with the FeatureExtractor example. model.json and model.weights.bin work "out of the box" with ml5, however our examples include additional functionality. For example, we allow the end user to specify a string label for a class rather than a numeric index. To be able to retain this information when saving, we would like to be able to customize model.json with ml5 specific properties.
What do you think about adding an optional argument to save() that allows for customization, e.g.
const ml5prop = {
labels: ['cat', 'dog']
};
model.save(destination, ml5prop);
This could be good for other use cases where tf.js users have other meta-data they would like to include.
Let us know your thoughts. We are happy to work on a pull request for this. Or perhaps you have a better idea for how we should be handling saving/loading on our end?
@cvalenzuela let me know if I missed anything or am not describing this properly!
I think this is a reasonable feature request.
Are the types of metadata that you want to include fairly structured, slowly changing, & generally useful? If so I'd prefer to establish standards for a specific set of fields. The label index to string mapping is certainly something that should be standardized.
I fear that if we allow arbitrary extensions to model.json, the result will be a profusion of mutually-incompatible file formats. For that matter I think Python Keras will currently error out on nonstandard fields.
So to me this is an ecosystem-wide question of getting the file formats right, not something we should just do as a local convenience. That said, if there is consensus that model.json can have a 'custom_metadata' field that loaders can know to ignore, that would be at least non-breaking (but would still not provide the benefit of standardization).
Another workaround you could do right away would be to wrap model.save() in another function, e.g. saveModelWithMetadata(model, ml5prop) that writes a separate metadata.json file. That might require updates to IOHandlers (basically to allow writing and reading additional files, analogous to the assets/ directory in TF SavedModel).
Perhaps the idea of saveModelWithMetadata() might work better for compatibility formats across the ecosystem.
But building on top of that, maybe something like model.saveWeights() in conjunction with model.getSpecs() could also be implemented? model.saveWeights() can be used just to get the weights file and model.getSpecs() can return a json with specs that we can then internally modify and then trigger to download. I have had issues when downloading multiple files with model.save() in Chrome (https://bugs.chromium.org/p/chromium/issues/detail?id=822542) so having options to download the files individually could potentially help?
@davidsoergel right now the model.json is already a "custom" format, given that it marries the weights manifest JSON and the keras model JSON. Adding another valid JSON field seems okay because it's specific to this environment anyways.
@nsthorat as you know, I think we should not have diverged from keras.json in the first place. But this would go much further. 1) model.json was one new format. The proposal here would create arbitrarily many new formats because each user / library developer would store metadata in an idiosyncratic form in the new custom field. 2) Many useful kinds of metadata are not specific to any particular environment. The given example, mapping class IDs to string labels, is important in every environment--so it would be sad if tool A writes it in a form that tool B can't read.
So it seems to me that this approach encourages balkanization, where we should be encouraging interoperability. Let's channel this motivation to store metadata (with which I wholeheartedly agree!) into a standard representation. In particular, I think it would be great to work with the tf.Metadata team (https://github.com/tensorflow/metadata, @mzinkevi) to make sure they publish enough docs / specs / JS bindings that we can integrate it here.
@cvalenzuela and I were discussing this a bit more on Friday. In case this is helpful, we would be very happy to do our own file saving and loading in ml5 and not have any changes to model.save() in tf.js. The issue right now is the only way for us to get the data that ends up in model.json is by saving a file. If there were a method like model.getJSON() that returned what ends up in the saved file, we could make our own modifications and save our own file. Thanks so much for this discussion!
@shiffman that seems reasonable too! You can use a custom save handler to get the results (which contain all the files): https://js.tensorflow.org/api/0.13.0/#tf.Model.save
You can also use a custom load handler on the other side as well, with tf.loadModel.
Let me know if you run into any trouble implementing that.
ha you just beat me to it :)
In particular: you can use withSaveHandler (https://github.com/tensorflow/tfjs-core/blob/master/src/io/passthrough.ts#L96), where you can provide a callback that receives the to-be-saved objects as input.
@nsthorat @davidsoergel oh, I hadn't noticed that when looking over the docs! Thank you! @cvalenzuela, will that work for our needs? If so, we can close this issue (unless it's a relevant discussion for other aspects of the project too).
Looks like withSaveHandler() is not mentioned in the generated API docs. Let's leave this issue open until that is fixed, at least.
nice! withSaveHandler() seems to be what we need. I'll try that instead of model.save()
I'm using withSaveHandler() in the following way:
model.save(tf.io.withSaveHandler((obj2save) => { ... }))
and the callback value, obj2save, being returned is:
{
modelTopology: {鈥,
weightData: ArrayBuffer(5018800),
weightSpecs: Array(3)
}
What's the best way to save the weightData into a .bin file? What's the difference between weightSpecs generated with withSaveHandler and weightsManifest that you get with .save('downloads://')?
weightSpecs are a subset of weightsManifest that don't contain the paths. weightsSpecs are of type WeightsManifestEntry, here is the corresponding entry in the weights manifest: https://github.com/tensorflow/tfjs-core/blob/master/src/io/types.ts#L59
You can just kick off a download using a Blob, createObjectURL and forcing a link (google how to trigger file download from JS).
Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!
Most helpful comment
I think this is a reasonable feature request.