Transformers: masked_lm_loss in BertForMaskedLM model

Created on 28 Mar 2020 · 12Comments · Source: huggingface/transformers

Altought I've read the documentation related to BertForMaskedLM class, I still cannot understand how to properly calculate loss for my problem.

Let's suppose that my target sentence is:
"_I will be writing when you arrive._"
I want to calculate loss for all words except 'arrive'.

The documentation says:

masked_lm_labels (torch.LongTensor of shape (batch_size, sequence_length), optional, defaults to None) Labels for computing the masked language modeling loss. Indices should be in [-100, 0, ..., config.vocab_size] (see input_ids docstring) Tokens with indices set to -100 are ignored (masked), the loss is only computed for the tokens with labels in [0, ..., config.vocab_size]

Due to the way I understood, I should pass to the _masked_lm_labels_ argument a tensor that contains following indices:
tensor([[ 101, 1045, 2097, 2022, 3015, 2043, -100, 7180, 1012, 101]])
It returns error:
RuntimeError: Assertion 'cur_target >= 0 && cur_target < n_classes' failed.

Can you help me and point out what is wrong in my thinking?

Source

Drpulti

👍1

Most helpful comment

An input sentence is a sequence of sub-word tokens, represented by their IDs. This is what input_ids would represent (before masking). The mask_tokens methods takes in this, and chooses 15% of the tokens for a "corruption" process. In this "corruption" process, 80% of the chosen tokens become [MASK], 10% get replaced with a random word and 10% are untouched.

The goal of the bert model will be to take in the "corrupted" input_ids and predict the correct token for each token. The correct tokens, masked_lm_labels are also produced by the mask_token methods. The values of this tensor would ideally be a clone of the "uncorrupted" input_ids, but since the loss is computed over only the "corrupted" tokens, the value of masked_lm_labels for the 85% of tokens that aren't chosen for "corruption" is set to -100 so that it gets ignored by CrossEntropyLoss.

Genius1237 on 29 Mar 2020

👍3

All 12 comments

Have a look at the mask_tokens method in run_language_modeling.py. This takes in the input_ids, performs masking on them and returns the masked input_ids and corresponding masked_lm_labels.

Genius1237 on 28 Mar 2020

@Drpulti I am also getting the same error as you, and I believe it is because -100 exists in the masked_lm_labels returned by mask_tokens.

These are fed to the forward hook of BertForMaskedLM (or whatever pre-trained model you are using), and ultimately to CrossEntropyLoss, which throws an error for labels < 0.

https://github.com/huggingface/transformers/blob/601ac5b1dc1438f00d09696588f2deb0f045ae3b/src/transformers/modeling_bert.py#L1001-L1004

The docstring says:

https://github.com/huggingface/transformers/blob/601ac5b1dc1438f00d09696588f2deb0f045ae3b/src/transformers/modeling_bert.py#L933-L937

but I don't see the logic where masked_lm_labels == -100 are ignored. You can even see a comment that says -100 is masked,

https://github.com/huggingface/transformers/blob/601ac5b1dc1438f00d09696588f2deb0f045ae3b/src/transformers/modeling_bert.py#L1002

but again, where is the code that does this? I figure that both of us might be missing the step that properly handles these -100 values.

JohnGiorgi on 28 Mar 2020

I believe that the -100 part is handled by CrossEntropyLoss (https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#nll_loss)

I think that in your case, you might be having some mismatch between pytorch and transformers versions. Try upgrading to the latest of both and check if the error is still there.

Genius1237 on 29 Mar 2020

When my label contains -100, I get this error when running “IndexError: Target -100 is out of bounds.”

tom1125 on 29 Mar 2020

Could you be a bit more specific as to where the error is coming from? Maybe a stack trace would be nice. Also, please upgrade your pytorch and transformers packages. I'm running transformers 2.5.0 and pytorch 1.4.0 and don't get any such issue.

Genius1237 on 29 Mar 2020

@Genius1237 in fact,i think i don't relly know what is the meaning of masked_lm_labels, I want to know what he expresses and how can we get him

tom1125 on 29 Mar 2020

@tom1125 I'm not understanding you. Are you saying that you want to know how masked_lm_labels are computed and how it's used in computing the loss?

Genius1237 on 29 Mar 2020

@Genius1237 yes ,and i want to know how to get it,thanks

tom1125 on 29 Mar 2020

Genius1237 on 29 Mar 2020

👍3

@Genius1237 thank you very much,it really helps me.

tom1125 on 29 Mar 2020

I believe that the -100 part is handled by CrossEntropyLoss (https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#nll_loss)

I think that in your case, you might be having some mismatch between pytorch and transformers versions. Try upgrading to the latest of both and check if the error is still there.

You are right! Thanks. I will try updating both packages

JohnGiorgi on 29 Mar 2020

I believe that the -100 part is handled by CrossEntropyLoss (https://pytorch.org/docs/stable/_modules/torch/nn/functional.html#nll_loss)

I think that in your case, you might be having some mismatch between pytorch and transformers versions. Try upgrading to the latest of both and check if the error is still there.

You are right, upgrade helped to resolve the issue. I'm closing the thread.

Drpulti on 30 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings