Flair: Change the tokenizer inside the model.

Created on 21 Sep 2019  路  2Comments  路  Source: flairNLP/flair

I find the way the way to change tokenizer in the test stage.

# your text of many sentences
text = "This is a sentence. This is another sentence. I love Berlin."

# use a library to split into sentences
from segtok.segmenter import split_single
from flair.data import segtok_tokenizer

sentences = [Sentence(sent, use_tokenizer=segtok_tokenizer) for sent in split_single(text)]

# predict tags for list of sentences
tagger: SequenceTagger = SequenceTagger.load('ner')
tagger.predict(sentences)

Are we able to use customized tokenizers in the training stage?
Thank you.

question

All 2 comments

To train a NER you are supposed the provide a TextCorpus so you are free to use the tokenizer you like. Is there something special you need?

I see. Thank you.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

happypanda5 picture happypanda5  路  3Comments

Aditya715 picture Aditya715  路  3Comments

Rahulvks picture Rahulvks  路  3Comments

inyukwo1 picture inyukwo1  路  3Comments

Y4rd13 picture Y4rd13  路  3Comments