Do I need to keep the '.' in original text input in order for fasttext to work better? Thanks.
Hi @yuhengd,
Keeping or removing punctuation in the input text should not have a big influence on the performance of the model. What is important is to "separate" the punctuation from the words (i.e. transforming "The cat sat on the mat." to "The cat sat on the mat ."). This pre-processing step is known as tokenization and can be performed using various open source tools, such as the Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.html). You can also try to lowercase the input text.
Best,
Edouard.
Most helpful comment
Hi @yuhengd,
Keeping or removing punctuation in the input text should not have a big influence on the performance of the model. What is important is to "separate" the punctuation from the words (i.e. transforming "The cat sat on the mat." to "The cat sat on the mat ."). This pre-processing step is known as tokenization and can be performed using various open source tools, such as the Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.html). You can also try to lowercase the input text.
Best,
Edouard.