Fairseq: RoBERTa model problem

Created on 5 Aug 2019  路  13Comments  路  Source: pytorch/fairseq

Hi,

I was trying to fine tune the roberta model with my own task with the implementation from this fantastic repo. However I do encounter one significant problem.

I trained the model on Google Colab and saved it with
torch.save(model.state_dict(), f"/content/drive/My Drive/roberta/models/state_fold{i}")
and then load the model with
model.load_state_dict(torch.load(path, map_location='cpu'))
on my local machine, where the method extract_features would just return the same output regardless the input.
I have been using a workaround by fix all the parameters of roberta when training and reload the roberta with
self.roberta = torch.hub.load('pytorch/fairseq', 'roberta.base')
after I load the state_dict, which fixed the issue but still kind of not satisfying as I cannot finetune the model but only the classification heads.

All 13 comments

hmm, that's weird. Can you please make sure that model.load_state_dict is loading the state dict correctly? Also please make sure the when you are evaluating the model after loading you call model.eval() to disable dropout.

Also if above doesn't help, can you please share more code snippet to help me understand the issue?

  1. I think I did right as I just copy/paste what is suggested at Pytorch.org for saving and reloading models as you can see my code up there.
  2. I thought, at first, that it might be the problem of .eval() but as it turns out this happens regardless of whether I call the method.

hmm, can you please share minimum code that gives you this error? It'd be very helpful for me in debuggin this issue.

In Google Colab:

torch.save(model.state_dict(), f"/content/drive/My Drive/roberta/models/state_fold{i}")

After which I download the file to my local machine. Then I load it.

model = A_Class_That_Contains_RoBERTa()
model = model.load_state_dict(torch.load(path, map_location='cpu'))
model.eval()
sentences = ["This bug is kind of funny", "Roberta is great!"]
inputs = [model.roberta.encode(sent) for sent in sentences]
inputs = pad_sequences(inputs) # pad the sequences to the same length with 1 at the end 
model.roberta.extract_features(inputs) # results in the same values.

Also when I use collate_tokens as
from fairseq.data.data_utils import collate_tokens, it reports a problem (That is why I used a customized version of it) which I am not sure yet whether it is my problem or not. But I eliminated the possibility that this is the reason for the above issue as I tried with the examples and some random LongTensor and it returns the same output.

Hmm, I'm not sure I complete understand. In the example you shared above, can you confirm the output of the following:
```
inputs = [model.roberta.encode(sent) for sent in sentences]
inputs = pad_sequences(inputs)
print(inputs.size()) # this should be 2 x 8
f1 = model.roberta.extract_features(inputs[0])
f2 = model.roberta.extract_features(inputs[1])
fbatch = model.roberta.extract_features(inputs)

do all of the above produce expected outputs?

``

I can make sure that the size of inputs is correct, but here is the thing:

so I tried again now, I don't know if it is the way I train the model or something else. Something has changed and here is what I have for now:

  1. If I fix the weights, it does not matter whether I reload the model or not after load_state_dict, the issue is gone.
  2. However if I train the whole model, the results are as follows:
tensor([[[-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         ...,
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902]],

        [[-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         ...,
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902],
         [-0.0274,  1.9014, -1.5367,  ...,  0.0214, -0.1681,  0.1902]]],
       grad_fn=<TransposeBackward0>)

I am guessing it is the hyperparameters that I use for training the model? I am using Adam with the default weights

Usually this kind of error is the result of bad broadcasting, where an input dimension is of size 1 and gets expanded somehow. This is really hard to debug without a more minimal example. Can you share the code for the pad_sequences function?

Sure

def pad_sequences(tensor_list, max_len=70):
    padded = []
    max_len = min(max_len, max(len(tensor) for tensor in tensor_list))
    for tensor in tensor_list:
        ones = torch.ones(max_len).type(torch.long)
        if len(tensor) < max_len:
            ones[0:len(tensor)] = tensor
        else:
            ones = tensor[0:max_len]
        padded.append(ones)
    return torch.stack(padded)

Hmm, that seems fine. In general I'm not able to reproduce this -- even with torch.save, torch.load, the exact sentences you tried, etc. What version of PyTorch are you using?

I have been using Pytorch 1.1.0 on Windows 10. Sorry did not reply sooner. I'll try to see if I can reproduce this on other platforms.

I want to pretrain fairseq model in Georgian language, does fairseq has a support to make a sentence embeddings in custom languages?

@3NFBAGDU Good question! A readme was recently added for pre-training RoBERTa:

https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.pretraining.md

But one problem could be, that a previously built dictionary is downloaded and used, see this line:

wget -O gpt2_bpe/dict.txt https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt

You can't use that dictionary for a non-English language 馃

Maybe @myleott could give a hint how to create such a dictionary for another corpus/language 馃

Hmm, this is quite off topic, but that BPE code technically supports any language, since it's byte-level, but most of the codes are English words so it would essentially be doing character-level modeling. We don't have code released for creating your own BPE in this format, since the dictionary is borrowed from GPT-2. We are currently working on a multilingual version, but there is no expected date yet.

Was this page helpful?
0 / 5 - 0 ratings