Transformers: RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Created on 21 Feb 2020 · 8Comments · Source: huggingface/transformers

🐛 Bug

  File "C:\Users\temp\Aida\aida\agents\bertbot\Bert\bert_intent_classifier_pytorch.py", line 298, in process
    logits = self.model(prediction_inputs, token_type_ids=None, attention_mask=prediction_masks)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\transformers\modeling_bert.py", line 897, in forward
    head_mask=head_mask)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\transformers\modeling_bert.py", line 624, in forward
    embedding_output = self.embeddings(input_ids, position_ids=position_ids, token_type_ids=token_type_ids)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\transformers\modeling_bert.py", line 167, in forward
    words_embeddings = self.word_embeddings(input_ids)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "C:\Users\temp\Anaconda3\envs\fresh\lib\site-packages\torch\nn\functional.py", line 1484, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have scalar type Long; but got torch.cuda.IntTensor instead (while checking arguments for embedding)

Issue

Hi everyone when I run the line:

outputs = model(input_ids = b_input_ids, attention_mask=b_input_mask, labels=b_labels)

with model defined as,

model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=numlabels)

It returns the stated error. However this only happens when I am on my windows computer.
When I run the exact same code with the same python version and libraries it works perfectly fine.
I have the most up to date version of pytorch (1.4) and transformers installed.

Any help would be greatly appreciated

Information

Using the latest version of pytorch and transformers
Model I am using (Bert, XLNet ...): BertForSequenceClassification
Language I am using the model on (English, Chinese ...): English

PyTorch wontfix

Source

Aidanlochbihler

👍2

Most helpful comment

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Had similar issue:
following fix from stackoverflow worked.
b_input_ids = torch.tensor(b_input_ids).to(torch.int64)

https://stackoverflow.com/questions/56360644/pytorch-runtimeerror-expected-tensor-for-argument-1-indices-to-have-scalar-t

chintanckg on 19 May 2020

👍9

All 8 comments

It is weird that there is a discrepancy between Windows and Linux.

Could you try casting your variables b_input_ids, b_input_mask and b_labels to torch.long?

Are you defining some of your variables on GPU? Does it fail if everything stays on CPU?

LysandreJik on 21 Feb 2020

I often prototype on Windows and push to Linux for final processing and I've never had this issue. Can you post a minimal working example that I can copy-paste to test?

BramVanroy on 21 Feb 2020

👍1

Ok update I got the error to go away but to do it I had to do some janky fixes that I don't think should be necessary

So if I cast all my variables as ex: b_labels = b_labels.type(torch.LongTensor) and I train on CPU it works (but its super slow)
If I want to train on GPU I again cast the tensors to long but then have to cast all of my tensors to GPU (.to(device)) even though I already did it

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=numlabels)

model.cuda()
#model = nn.DataParallel(model)

# This variable contains all of the hyperparemeter information our training loop needs
# Parameters:
lr = 2e-5
max_grad_norm = 1.0
num_training_steps = 1000
num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps)  # 0.1

### In Transformers, optimizer and schedules are splitted and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False)  # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps)  # PyTorch scheduler

t = [] 

# Store our loss and accuracy for plotting
train_loss_set = []

# Number of training epochs (authors recommend between 2 and 4)
epochs = 5 #5:0.96

# trange is a tqdm wrapper around the normal python range
for _ in trange(epochs, desc="Epoch"):
    # Training
    # Set our model to training mode (as opposed to evaluation mode)
    model.train()
    # Tracking variables
    tr_loss = 0
    nb_tr_examples, nb_tr_steps = 0, 0

    # Train the data for one epoch
    for step, batch in enumerate(train_dataloader):
        # Add batch to GPU
        batch = tuple(t.to(device) for t in batch)

        # Unpack the inputs from our dataloader
        b_input_ids, b_input_mask, b_labels = batch

        ###############Bug fix code####################
        b_input_ids = b_input_ids.type(torch.LongTensor)
        b_input_mask = b_input_mask.type(torch.LongTensor)
        b_labels = b_labels.type(torch.LongTensor)

        b_input_ids = b_input_ids.to(device)
        b_input_mask = b_input_mask.to(device)
        b_labels = b_labels.to(device)
         ############################################
        # Clear out the gradients (by default they accumulate)
        optimizer.zero_grad()

        # Forward pass
        outputs = model(input_ids = b_input_ids, attention_mask=b_input_mask, labels=b_labels)
        loss, logits = outputs[:2]

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm)  # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
        optimizer.step()
        scheduler.step()

`
Very strange
(posted the code I thought would be useful to see let me know if you need to see more)

Aidanlochbihler on 21 Feb 2020

You're doing .to(device) twice for your data (once in the tuple, once separately). It is hard to reproduce this because we don't have your data, so we don't know how you encode your data. What is example contents of batch to reproduce your issue?

BramVanroy on 21 Feb 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 22 Apr 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Had similar issue:
following fix from stackoverflow worked.
b_input_ids = torch.tensor(b_input_ids).to(torch.int64)

https://stackoverflow.com/questions/56360644/pytorch-runtimeerror-expected-tensor-for-argument-1-indices-to-have-scalar-t

chintanckg on 19 May 2020

👍9

Having the same issue, funny thing is the whole model worked for training, but while running inference on test data the error automatically showed up