Altought I've read the documentation related to BertForMaskedLM class, I still cannot understand how to properly calculate loss for my problem.
Let's suppose that my target sentence is:
"_I will be writing when you arrive._"
I want to calculate loss for all words except 'arrive'.
The documentation says:
masked_lm_labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]
Due to the way I understood, I should pass to the _masked_lm_labels_ argument a tensor that contains following indices:
tensor([[ 101, 1045, 2097, 2022, 3015, 2043, -100, 7180, 1012, 101]])
It returns error:
RuntimeError: Assertion 'cur_target >= 0 && cur_target < n_classes' failed.
Can you help me and point out what is wrong in my thinking?
Have a look at the mask_tokens method in run_language_modeling.py. This takes in the input_ids, performs masking on them and returns the masked input_ids and corresponding masked_lm_labels.
@Drpulti I am also getting the same error as you, and I believe it is because -100 exists in the masked_lm_labels returned by mask_tokens.
These are fed to the forward hook of BertForMaskedLM (or whatever pre-trained model you are using), and ultimately to CrossEntropyLoss, which throws an error for labels < 0.
The docstring says:
but I don't see the logic where masked_lm_labels == -100 are ignored. You can even see a comment that says -100 is masked,
but again, where is the code that does this? I figure that both of us might be missing the step that properly handles these -100 values.
I believe that the -100 part is handled by CrossEntropyLoss (https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#nll_loss)
I think that in your case, you might be having some mismatch between pytorch and transformers versions. Try upgrading to the latest of both and check if the error is still there.
When my label contains -100, I get this error when running “IndexError: Target -100 is out of bounds.”
Could you be a bit more specific as to where the error is coming from? Maybe a stack trace would be nice. Also, please upgrade your pytorch and transformers packages. I'm running transformers 2.5.0 and pytorch 1.4.0 and don't get any such issue.
@Genius1237 in fact,i think i don't relly know what is the meaning of masked_lm_labels, I want to know what he expresses and how can we get him
@tom1125 I'm not understanding you. Are you saying that you want to know how masked_lm_labels are computed and how it's used in computing the loss?
@Genius1237 yes ,and i want to know how to get it,thanks
An input sentence is a sequence of sub-word tokens, represented by their IDs. This is what input_ids would represent (before masking). The mask_tokens methods takes in this, and chooses 15% of the tokens for a "corruption" process. In this "corruption" process, 80% of the chosen tokens become [MASK], 10% get replaced with a random word and 10% are untouched.
The goal of the bert model will be to take in the "corrupted" input_ids and predict the correct token for each token. The correct tokens, masked_lm_labels are also produced by the mask_token methods. The values of this tensor would ideally be a clone of the "uncorrupted" input_ids, but since the loss is computed over only the "corrupted" tokens, the value of masked_lm_labels for the 85% of tokens that aren't chosen for "corruption" is set to -100 so that it gets ignored by CrossEntropyLoss.
@Genius1237 thank you very much,it really helps me.
I believe that the
-100part is handled byCrossEntropyLoss(https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#nll_loss)I think that in your case, you might be having some mismatch between pytorch and transformers versions. Try upgrading to the latest of both and check if the error is still there.
You are right! Thanks. I will try updating both packages
I believe that the
-100part is handled byCrossEntropyLoss(https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#nll_loss)I think that in your case, you might be having some mismatch between pytorch and transformers versions. Try upgrading to the latest of both and check if the error is still there.
You are right, upgrade helped to resolve the issue. I'm closing the thread.
Most helpful comment
An input sentence is a sequence of sub-word tokens, represented by their IDs. This is what
input_idswould represent (before masking). Themask_tokensmethods takes in this, and chooses 15% of the tokens for a "corruption" process. In this "corruption" process, 80% of the chosen tokens become [MASK], 10% get replaced with a random word and 10% are untouched.The goal of the bert model will be to take in the "corrupted"
input_idsand predict the correct token for each token. The correct tokens,masked_lm_labelsare also produced by themask_tokenmethods. The values of this tensor would ideally be a clone of the "uncorrupted"input_ids, but since the loss is computed over only the "corrupted" tokens, the value ofmasked_lm_labelsfor the 85% of tokens that aren't chosen for "corruption" is set to-100so that it gets ignored byCrossEntropyLoss.