Fasttext: C++ BPE for FastText

Created on 23 Jan 2019  路  1Comment  路  Source: facebookresearch/fastText

Facebook has recently released LASER - https://github.com/facebookresearch/LASER, a language-Agnostic sentence representations, that internally makes use of a relatively new C++ BPE implementation - https://github.com/glample/fastBPE

It could make sense to support this package within FastText as general approach to subwords so that it would handle multiple languages.

In my case, handling languages like Hindi, etc. could be tricky when segmenting/tokenizing before training a new FT model, let' say for language recognition. Having a generic approach could solve this.

Feature request

Most helpful comment

[UPDATE]
fastBPE now supports Python (through Cython) so it could be used in the FastText Python wrapper directly - https://github.com/glample/fastBPE/issues/12

>All comments

[UPDATE]
fastBPE now supports Python (through Cython) so it could be used in the FastText Python wrapper directly - https://github.com/glample/fastBPE/issues/12

Was this page helpful?
0 / 5 - 0 ratings