Fasttext: Do I need to keep period in input text

Created on 11 Oct 2017  路  1Comment  路  Source: facebookresearch/fastText

Do I need to keep the '.' in original text input in order for fasttext to work better? Thanks.

Most helpful comment

Hi @yuhengd,

Keeping or removing punctuation in the input text should not have a big influence on the performance of the model. What is important is to "separate" the punctuation from the words (i.e. transforming "The cat sat on the mat." to "The cat sat on the mat ."). This pre-processing step is known as tokenization and can be performed using various open source tools, such as the Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.html). You can also try to lowercase the input text.

Best,
Edouard.

>All comments

Hi @yuhengd,

Keeping or removing punctuation in the input text should not have a big influence on the performance of the model. What is important is to "separate" the punctuation from the words (i.e. transforming "The cat sat on the mat." to "The cat sat on the mat ."). This pre-processing step is known as tokenization and can be performed using various open source tools, such as the Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.html). You can also try to lowercase the input text.

Best,
Edouard.

Was this page helpful?
0 / 5 - 0 ratings