Fasttext: Do I need to keep period in input text

Created on 11 Oct 2017 · 1Comment · Source: facebookresearch/fastText

Do I need to keep the '.' in original text input in order for fasttext to work better? Thanks.

Source

yuhengd

👍1

Most helpful comment

Hi @yuhengd,

Keeping or removing punctuation in the input text should not have a big influence on the performance of the model. What is important is to "separate" the punctuation from the words (i.e. transforming "The cat sat on the mat." to "The cat sat on the mat ."). This pre-processing step is known as tokenization and can be performed using various open source tools, such as the Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.html). You can also try to lowercase the input text.

Best,
Edouard.

EdouardGrave on 26 Oct 2017

👍5

>All comments

Hi @yuhengd,

Keeping or removing punctuation in the input text should not have a big influence on the performance of the model. What is important is to "separate" the punctuation from the words (i.e. transforming "The cat sat on the mat." to "The cat sat on the mat ."). This pre-processing step is known as tokenization and can be performed using various open source tools, such as the Stanford Tokenizer (https://nlp.stanford.edu/software/tokenizer.html). You can also try to lowercase the input text.

Best,
Edouard.

EdouardGrave on 26 Oct 2017

👍5

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Questions regarding the embeddings produced by the `skipgram` and `supervised` options

danoneata · 3Comments

Question: How to analyze sentence similarity under fastText?

leonardgithub · 4Comments

Python load_model outputs blank lines to the console

alanorth · 3Comments

Version is somehow behind the one in PyPi

yasonk · 3Comments

wordNgrams in unsupervised mode (cbow and skipgram)

mino98 · 3Comments