Transformers: cuda out of memory

Created on 26 Jul 2019 · 6Comments · Source: huggingface/transformers

`import torch
from pytorch_transformers import BertTokenizer, BertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
import csv
data = []
label = []
with open('Training.csv','r') as file:
reader = csv.reader(file)
for row in reader:
data.append("[CLS] "+row[1]+" [SEP]")
label.append(int(row[2]))

def tokenize_data(data): # for numericalizing the text
for sub in range(len(data)):
data_tokenized = tokenizer.encode(data[sub])
data[sub] = data_tokenized
return data

def make_batches(data): # for making all the sentences into same length
max_len = len(data[-1])
for i in range(len(data)):
if(len(data[i]) < max_len):
iter = max_len - len(data[i])
for j in range(iter):
data[i].append(102)
return data

optim = torch.optim.Adam(model.parameters(), lr=2e-05, betas=(0.9, 0.98), eps=1e-9)
import numpy as np
model = model.cuda()
model.train()

model = torch.nn.DataParallel(model)

batch_size = 20
for i in range(0,len(data),batch_size):
print(i)
if True:
batch = data[i:i+batch_size]
batch = tokenize_data(batch)
batch.sort(key = lambda x : len(x))
batch = make_batches(batch)
batch = torch.tensor(batch)
target = torch.tensor(label[i:i+batch_size])
inp = batch.cuda()
target = target.cuda()
output = model(inp)
loss = torch.nn.functional.cross_entropy(output[0].view(-1,output[0].size()[-1]),target.contiguous().view(-1))
print(loss)
optim.zero_grad()
model.zero_grad()
loss.backward()
optim.step()

print("success")
so the above is my code and whenever i run it ,it give me error sayingTraceback (most recent call last):
File "classification_using_bert.py", line 49, in
loss.backward()
File "/home/zlabs-nlp/miniconda3/envs/ravienv/lib/python3.7/site-packages/torch/tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zlabs-nlp/miniconda3/envs/ravienv/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 10.92 GiB total capacity; 6.34 GiB already allocated; 28.50 MiB free; 392.76 MiB cached)`

CAN ANYONE TEL ME WHAT IS MISTAKE
THANKS IN ADVANCE !!!!!!!!!!

wontfix

Source

Ravikiran2611

Most helpful comment

Try to implement gradient accumulation during training, instead of updating parameters in each iteration. Please check this nice and easy-to-follow tutorial by @thomwolf here . I used this technique with GPT-2 small, with a dataset of ~350k, with single GPU and it worked completely fine.

sajidrahman on 26 Jul 2019

👍2

All 6 comments

sajidrahman on 26 Jul 2019

👍2

thanks @sajidrahman
i will go through it

Ravikiran2611 on 7 Aug 2019

Edit: There is a parameter now for gradient_accumulation_steps... this can be adjusted to achieve gradient accumulation?

DanyalAndriano on 6 Oct 2019

The problem is about batch size 20. Batch sizes more than 4 are something that doesn't fit most of (single) gpu's for many models. Check this: https://github.com/huggingface/transformers/issues/2016#issuecomment-561093186 . Some cases you cannot make fit even 1 batch to memory. As @sajidrahman mentioned, this is a good point to start.
The issue can be closed if everything is clear?

iedmrc on 4 Dec 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.