Flair: CUDA out of memory

Created on 15 Feb 2019  路  12Comments  路  Source: flairNLP/flair

I have been experimenting ELMo pubmed embedding for NER tagging, but keeps running into CUDA out of memory issue. I set embedding_in_memory as False, also tried various batch size: 16, 8, 6. None of them succeeded. I am using p2.xlarge, with 60G RAM and 12G GPU.

In total I used three embeddings, ELMo pubmed, CharacterEmbedding, and a customized 100d FastText embedding.

My training data is about 70MB, validation and test data are about 6MB respectively.

Is it because my GPU is too small for the above data and embeddings combination?

Any help is highly appreciated!

The error message is similar to the following:

File "/home/ubuntu/hyang/ner/flair/embeddings.py", line 382, in _add_embeddings_internal
    embeddings = self.ee.embed_batch(sentence_words)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/allennlp/commands/elmo.py", line 244, in embed_batch
    embeddings, mask = self.batch_to_embeddings(batch)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/allennlp/commands/elmo.py", line 186, in batch_to_embeddings
    bilm_output = self.elmo_bilm(character_ids)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 605, in forward
    token_embedding = self._token_embedder(inputs)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 374, in forward
    convolved = conv(character_embedding)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 187, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 9.10 GiB (GPU 0; 11.17 GiB total capacity; 1.34 GiB already allocated; 6.99 GiB free; 2.53 GiB cached)
question

Most helpful comment

That is very strange - we sometimes use p2.xlarge instances and have not yet run into CUDA issues.

Is it possible that you have very long sentences in your data set? This will cause a problem for models like ELMo because the full sentence is pushed through the language model which may cost a lot of GPU memory.

To address this at least for FlairEmbeddings, we added a new truncated forward pass method (#387). Could you try initializing FlairEmbeddings like this:

embeddings = FlairEmbeddings('news-forward', chars_per_chunk=128)

Could you try this out and see if it works?

Also, could you share the length of the longest sentence in characters in your dataset?

All 12 comments

@gccome Is your dataset publically available? For an initial experiment you could try to use the ELMo embeddings only.

@stefan-it No the data is not publicly available. Thanks for the suggestion. I will run ELMo only and report the results here. Thanks!

@stefan-it @alanakbik I tried various combinations of embeddings, including only a own trained forward Flair embedding (~20MB), only pubmed ELMo. None of them worked, all throwing CUDA out of memory error. I used batch size 8 which was pretty small enough. I am now trying batch size 4.

Since at this point of time, the trainer reads all training data into memory. In my understanding, the GPU usage should be somewhat consistent. I noticed that in the first couple of iterations, GPU usage is about 4GB, then as training goes along, GPU usage increased to 10GB, and then after another couple of iterations, it is out of memory. Is this normal? Why does GPU usage keep increasing?

Do you mind sharing the machine configuration (GPU, CPU, RAM, etc.) when you do experiment with Flair? Have you run into such issues?

By any chance Flair team is working on batch generator? This would be a great enhancement for Flair!

Many thanks!

Indeed, I got the same issue since the weekend. I'm doing a NER experiment using just word embedding and I got the "CUDA out of memory" issue. I even tried to reduce the hidden-size to 16 and the batch_size to 1 and I still have the same issue :

from flair.models import SequenceTagger
tagger: SequenceTagger = SequenceTagger(hidden_size=16,
embeddings=embeddings,
tag_dictionary=tag_dictionary,
tag_type=tag_type,
use_crf=True)

initialize trainer

from flair.trainers import ModelTrainer
trainer: ModelTrainer = ModelTrainer(tagger, corpus)

start training

results = trainer.train('NER_Results',
learning_rate=0.1,
mini_batch_size=1,
max_epochs=150,
checkpoint=True,
embeddings_in_memory=False)
print(results)

Actually, In my case, the bottleneck is the data loader (NLPTaskDataFetcher.load_column_corpus). My dataset is about 150K words that's why I got the CUDA out of memory issue.
@alanakbik any workaround that I can use to overcome this issue?

Any progress on this? Thanks!

Hi @gccome that is strange, GPU usage should remain roughly constant after the first epoch. Does it complete a full epoch for you or is it throwing the error within the first epoch? Also, which version of Flair are you using?

@Yugioh1984 Yes, this is a known issue that occurs if the data set is too large to be loaded into memory. We are planing to address this and other problems by using a DataLoader (see #426), but this feature will not make it into 0.4.1 - we hope to include it in 0.5 though!

Hi @alanakbik, no it didn't complete the first epoch for most of my model runs (usually with multiple embedding combination). For those runs that completed a full epoch, there is no CUDA issue.
I have been using the latest version at master branch.

That is very strange - we sometimes use p2.xlarge instances and have not yet run into CUDA issues.

Is it possible that you have very long sentences in your data set? This will cause a problem for models like ELMo because the full sentence is pushed through the language model which may cost a lot of GPU memory.

To address this at least for FlairEmbeddings, we added a new truncated forward pass method (#387). Could you try initializing FlairEmbeddings like this:

embeddings = FlairEmbeddings('news-forward', chars_per_chunk=128)

Could you try this out and see if it works?

Also, could you share the length of the longest sentence in characters in your dataset?

@alanakbik The longest sentence has length 44,177. I am using release 0.4.1 on the repo now, and experimenting with FlairEmbedding + batch size 8, so far is has been running Okay. Most of the CUDA issues I had previously came from ELMo embedding. Is there a way to also limit the chunk size for ELMo?

Wow this is a long sentence - I am not sure if a similar thing exists for ELMo in the AllenNLP implementation (you currently cannot set such an option for ELMoEmbeddings in Flair), but in principle the exact same idea could be applied here.

I agree. Maybe I should remove sentences that are too long. Those are usually noise in my data due to poor OCR. Thanks @alanakbik ! BTW, are we going to incorporate Transformer XL into Flair? I see there is a PR.

Was this page helpful?
0 / 5 - 0 ratings