Flair: Why does it takes a week to train the embedding?

Created on 12 Jul 2019 · 12Comments · Source: flairNLP/flair

I notice that in the paper, the authors says it requires a week to train the model. But I can not figure out why. In my opinion, it is trained on limited character embedding and only one layer lstm. Could anyone answer that? Thanks very much

question

Source

RoderickGu

All 12 comments

Hello @RoderickGu how long it trains depends on your corpus size, your model size (i.e. number of hidden states) and your resources (how long you want to train). For the paper we trained our models on the 1 billion word corpus with a hidden state size of 2048, and trained for one week. To redo these experiments, you can check out the tutorial here.

There's been some experiments by community members on training for shorter or longer periods and it seems that the difference is not so big. There's also been some speed improvements contributed to Flair. So it's probably possible to train good models in a few days now, though we haven't done much experimentation in this direction ourselves.

alanakbik on 15 Jul 2019

hello @alanakbik , in the tutorial, there seems not to be explicit GPU usage, how to use GPU to train my own embedding?

songtaoshi on 15 Jul 2019

@songtaoshi Flair auto-detects whether you have a GPU available. If there is a GPU, it will automatically run training there.

alanakbik on 15 Jul 2019

Thanks ! @alanakbik Are there any tutorials about how to predict the text in txt file into txt file?

songtaoshi on 16 Jul 2019

@songtaoshi there is a tutorial on tagging your text that includes info on how to tag a list of sentences. You would just need to add a may to read out sentences from your text file before that and write out results how you want them.

alanakbik on 16 Jul 2019

@alanakbik If I have trained my own embedding, when I read the txt into Sentences, will i need to use my own dictionary to do the tokenization?

songtaoshi on 16 Jul 2019

@alanakbik Thanks for your response!. I have another question about flair embedding. If I do not pretrained and use a much smaller hidden dim such as 50, but concatenate it with 200 dim word embedding, will it achived a good performance ? Because pretraining takes a lot of effect, I wonder whether it will be useful to random initialize the parameters. Thanks

RoderickGu on 17 Jul 2019

@RoderickGu If I have trained my own embedding, when I read the txt into Sentences, will i need to use my own dictionary to do the tokenization?

songtaoshi on 17 Jul 2019

@songtaoshi Yeah. The vocabulary depends on your corpus if you wanna use your own pretraining data.

RoderickGu on 17 Jul 2019

@RoderickGu if I understand correctly you would like to use uninitialized Flair embeddings with 50 hidden dimensions that you fine-tune in a downstream task - is this correct? If so, this would be something like in thie paper by Liu et al. and it works pretty well.

FlairEmbeddings currently require pre-training and cannot be fine-tuned on a task (we took this feature out because we thought noone was using it). We will add fine-tuning feature back in an upcoming PR so this will be possible again with Flair. In the meantime, you could use CharacterEmbeddings instead which are similar but limited to word-level features.

alanakbik on 17 Jul 2019

👍1

@alanakbik I think I have read this paper before, but did not pay much attention to it. I think that is the paper I need to strengthen my current model. Thanks for mentioning that paper !

RoderickGu on 17 Jul 2019

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.