Could we load the trained model from loaded bytes in memory?
looks like
f = 'classification.ftz'
bytes = open(f, 'rb').read()
m = FastText(bytes)
I tried with istream,
in fasttext_pybind.cc
.def(
"loadModelIstream",
[](fasttext::FastText& m, std::istream& in) { m.loadModel(in); })
in FastText.py
def load_model_istream(istream):
"""Load a model given the instream and return a model object."""
return _FastText(istream)
however, when calling this function, it ate all my memory almost.
Could anyone help me. Any information will be appreciated.
Hi @ericxsun,
I also ran into same issue but got it working.
First thing to point out in your solution is that pybind only supports std::string and char* when Python bytes are passed (Pybind: Pass bytes to C++). Keeping that in mind, following additions were made:
loadModel method that accepts char*istreamloadModelAdd loadModel that expects fixed size bytes following to fasttext.h
void loadModel(char* modelBytes, size_t size);
In fasttext.cc, couple of things are needed (please also look #286)
// includes
#include <streambuf>
#include <istream>
// under fasttext namespace
struct membuf: std::streambuf {
membuf(char const* base, size_t size) {
char* p(const_cast<char*>(base));
this->setg(p, p, p + size);
}
};
struct imemstream: virtual membuf, std::istream {
imemstream(char const* base, size_t size)
: membuf(base, size)
, std::istream(static_cast<std::streambuf*>(this)) {
}
};
void FastText::loadModel(char* modelBytes, size_t size) {
imemstream ifs(modelBytes, size);
if (!checkModel(ifs)) {
std::cerr << "Invalid file format" << std::endl;
exit(EXIT_FAILURE);
}
loadModel(ifs);
}
In fasttext_pybind.cc
.def(
"loadModel",
[](fasttext::FastText& m, char* modelBytes, size_t size) { m.loadModel(modelBytes, size); })
In FastText.py you can also accomodate changes in _FastText() __init__ instead of writing another load_model_istream method
def __init__(self, model=None):
self.f = fasttext.fasttext()
if model is not None:
if type(model) == bytes:
self.f.loadModel(model, len(model))
else:
self.f.loadModel(model)
As quick test, you can download language identification model and try out:
>>> import fastText as ft
>>>
>>> with open('lid.176.ftz', 'rb') as rbf:
... lid_bytes = rbf.read()
...
>>> lid_model = ft.load_model(lid_bytes)
>>> lid_model.predict("Could we load the trained model from loaded bytes in memory?", k=3)
(('__label__en', '__label__da', '__label__de'), array([0.84858966, 0.0172254 , 0.01400802]))
>>>
For conversion to istream please check original SO answer.
I'll be happy to submit PR if this can be reviewed.
Hope it helps, cheers!
@ericxsun you are trying this solution in order to share the same model image between more fasttext instances?
I have a similar case when I need to keep the unsupervised model open for both nearest neighbor and the analogies tasks, so I need to pipe (socket) open and of course two times the model loaded
./fasttext nn mymodel
./fasttext analogies model
this will result in two fasttext processes running at the same time but with exactly the same binary model loaded...
@loretoparisi not exactly the same case. I just want to load the model stored in memory not from file with given filename.
@suamin Thanks so much. I tested in my case, that's real save my life. Thank you very much.
@ericxsun I see thanks, question: what is the advantage to load the model from memory directly while loading from file? I mean there is a reason why fasttext is doing only in one way...
@ericxsun Thanks, glad it worked.
@loretoparisi it's helpful when your fastText model is stored as binary field in a DB or in general, not a local file. Also, fasttext namespace already has overloaded loadModel implementations (#L82) and load from file calls other one (#L190) under the hood. Same is followed for reading from memory and IMO, it's helpful and can be added to fastText.
Has anyone considered working on this again? As @suamin mentioned, I am running into this same issue, as my model is loaded from a DB.
Most helpful comment
Hi @ericxsun,
I also ran into same issue but got it working.
First thing to point out in your solution is that pybind only supports
std::stringandchar*when Python bytes are passed (Pybind: Pass bytes to C++). Keeping that in mind, following additions were made:loadModelmethod that acceptschar*istreamloadModelAdd
loadModelthat expects fixed size bytes following to fasttext.hIn fasttext.cc, couple of things are needed (please also look #286)
In fasttext_pybind.cc
In FastText.py you can also accomodate changes in
_FastText()__init__ instead of writing anotherload_model_istreammethodAs quick test, you can download language identification model and try out:
For conversion to
istreamplease check original SO answer.I'll be happy to submit PR if this can be reviewed.
Hope it helps, cheers!