`import torch
from pytorch_transformers import BertTokenizer, BertForSequenceClassification
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
import csv
data = []
label = []
with open('Training.csv','r') as file:
reader = csv.reader(file)
for row in reader:
data.append("[CLS] "+row[1]+" [SEP]")
label.append(int(row[2]))
def tokenize_data(data): # for numericalizing the text
for sub in range(len(data)):
data_tokenized = tokenizer.encode(data[sub])
data[sub] = data_tokenized
return data
def make_batches(data): # for making all the sentences into same length
max_len = len(data[-1])
for i in range(len(data)):
if(len(data[i]) < max_len):
iter = max_len - len(data[i])
for j in range(iter):
data[i].append(102)
return data
optim = torch.optim.Adam(model.parameters(), lr=2e-05, betas=(0.9, 0.98), eps=1e-9)
import numpy as np
model = model.cuda()
model.train()
batch_size = 20
for i in range(0,len(data),batch_size):
print(i)
if True:
batch = data[i:i+batch_size]
batch = tokenize_data(batch)
batch.sort(key = lambda x : len(x))
batch = make_batches(batch)
batch = torch.tensor(batch)
target = torch.tensor(label[i:i+batch_size])
inp = batch.cuda()
target = target.cuda()
output = model(inp)
loss = torch.nn.functional.cross_entropy(output[0].view(-1,output[0].size()[-1]),target.contiguous().view(-1))
print(loss)
optim.zero_grad()
model.zero_grad()
loss.backward()
optim.step()
print("success")
so the above is my code and whenever i run it ,it give me error saying
Traceback (most recent call last):
File "classification_using_bert.py", line 49, in
loss.backward()
File "/home/zlabs-nlp/miniconda3/envs/ravienv/lib/python3.7/site-packages/torch/tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/zlabs-nlp/miniconda3/envs/ravienv/lib/python3.7/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 10.92 GiB total capacity; 6.34 GiB already allocated; 28.50 MiB free; 392.76 MiB cached)`
CAN ANYONE TEL ME WHAT IS MISTAKE
THANKS IN ADVANCE !!!!!!!!!!
Try to implement gradient accumulation during training, instead of updating parameters in each iteration. Please check this nice and easy-to-follow tutorial by @thomwolf here . I used this technique with GPT-2 small, with a dataset of ~350k, with single GPU and it worked completely fine.
thanks @sajidrahman
i will go through it
Edit: There is a parameter now for gradient_accumulation_steps... this can be adjusted to achieve gradient accumulation?
The problem is about batch size 20. Batch sizes more than 4 are something that doesn't fit most of (single) gpu's for many models. Check this: https://github.com/huggingface/transformers/issues/2016#issuecomment-561093186 . Some cases you cannot make fit even 1 batch to memory. As @sajidrahman mentioned, this is a good point to start.
The issue can be closed if everything is clear?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Even in my case problem was my batch size of 8, worked after changing it to 2.
Most helpful comment
Try to implement gradient accumulation during training, instead of updating parameters in each iteration. Please check this nice and easy-to-follow tutorial by @thomwolf here . I used this technique with GPT-2 small, with a dataset of ~350k, with single GPU and it worked completely fine.