We currently support reading FastText models from Facebook's format. The gensim.models._fasttext_bin does this.
This enables people to use gensim with a model that was trained using Facebook's binaries.
Sometimes, people want things to work the other way: they start with gensim, train a model, and then want to save it to Facebook's format.
For this ticket, you will implement a save(model, fout) function that accepts a FastText object and saves it to a file stream in a Facebook-compatible format. It will essentially reverse the effects of the load function.
@menshikh-iv Just wanted to double-check with you: there is no other way to do this, right?
@mpenkov yes. BTW good ticket idea, but I guess it's a bit complicated for hacktoberfest.
Meh, I don't think it's _that_ hard, but if nobody does it, then I will :)
@mpenkov I'm not sure what the Facebook-compatible format is, but could this be as simple as serializing the object to a byte stream and writing to a file? like this
import pickle
f = open('newFile.txt','wb')
pickle.dump(model, f)
f.close()
No, I don't think it's _that_ simple.
You'll have to reverse-engineer the Facebook-compatible format from the source (either the source that loads their model, or elsewhere).
@mpenkov ok I see, I was hoping this was not the case. I will try to get a clear idea of the Facebook format and write something flexible
@jdaitawa Please avoid hijacking tickets with unrelated questions. If you have a question, ask on the mailing list (https://groups.google.com/forum/#!forum/gensim).
Hi, @piskvorky @mpenkov,
I believe PR #2712 addresses this.
Radim, thank you for the first round of the review. I already included the improvements following your comments.
If you have any more remarks please let me know.
Best regards,
Micha艂
PS. As far as CI goes:
1) The dying appveryor builds seem to be related to PR #2706
2) Yesterday, I struggled with Travis a bit. It turned out that the reason of pain was
os.environ.get("FT_HOME", None)
which was suprisingly returning empty string. This was not obvious and different than on my local machine. I thought I might share this, in case you have not bumped into this before...
About FT_HOME, if I remember correctly, this env variable should point to facebook fasttext binary (needed for wrapper testing + compatibility between our implementation & wrapper) and you need to set it manually before run tests locally. BTW this never runs in our CI, only manually (if needed).
Most helpful comment
Meh, I don't think it's _that_ hard, but if nobody does it, then I will :)