Is there any distributed implementation of fastText (Ex. Spark), for handling really large input text corpuses for learning word vectors. Word2Vec has a Spark Implementation http://spark.apache.org/docs/latest/ml-features.html#word2vec. Since sub-word information in fastText (summation of vectors of character n-grams) is a defining difference, is it straightforward to work towards a Spark based implementation, using Word2Vec as base code?
I'll add this as a future feature we might consider implementing. For now it's not on our list of priorities, but it might very well soon.
i firmly believe this(with spark) will be very helpful for us to train a very large Chinese text
@cpuhrsch Any plan on this?
Hello,
Is it always in your scope to implement fasttext with spark?
any updates on implementing fasttext with spark ?
Most helpful comment
I'll add this as a future feature we might consider implementing. For now it's not on our list of priorities, but it might very well soon.