Transformers: Unclear documentation for indice masking

Created on 6 Jan 2020 · 5Comments · Source: huggingface/transformers

🐛 Bug

Model I am using (Bert, XLNet....): CamemBERT but this probly applies to all MLMs.

Language I am using the model on (English, Chinese....): French

The problem arise when using:

[x] my own modified scripts, but I suspect that https://github.com/huggingface/transformers/blob/master/examples/run_lm_finetuning.py is also impacted.

Basically, the masking procedure raises an assertion error device-side when I try to run something akin to:

model(labels, masked_lm_labels=labels)

I pinpointed the error to be due to the fact that making values to be ignored in the labels with value -100 like here in the run_lm_finetuning.py script is problably deprecated. The documentation is unclear on the subject, as it says:

masked_lm_labels: (optional) torch.LongTensor of shape (batch_size, sequence_length):

Labels for computing the masked language modeling loss. 
Indices should be in [-1, 0, ..., config.vocab_size] (see input_ids docstring) 
Tokens with indices set to -100 are ignored (masked), the loss is only computed 
for the tokens with labels in [0, ..., config.vocab_size]

As you can see, information is contradictory: on one hand, they say values should be between [-1, vocab_size], but also say like in the script that tokens with values -100 are ignored. I tried, and using value -1 does indeed work.

The task I am working on is:

[x] my own task or dataset: I am finetuning the CamemBERT pretrained model on a MLM task before reusing the model to a sentence classification one.

To Reproduce

Steps to reproduce the behavior:

import torch
from transformers import CamembertForMaskedLM

model = CamembertForMaskedLM.from_pretrained(
    "camembert-base", cache_dir="models/pretrained_camembert"
)
inputs = torch.full((30, 1), 4).to(torch.long)
labels = inputs.clone()
labels[10] = -100
model(inputs, masked_lm_labels=labels)

This gives:

RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed.  at /pytorch/aten/src/THNN/generic/ClassNLLCriterion.c:97

If you run it on GPU a similar error is raised.

Expected behavior

Should return a loss.

Environment

OS: Ubuntu 18.04
Python version: 3.6.9
PyTorch version: 1.3.1
PyTorch Transformers version (or branch): 2.2.1
Using GPU ? Both do not work.
Distributed of parallel setup ?
Any other relevant information: Issue can be solved by replacing -1. As I said, I think at some point you switched to using -1 instead of -100 but did not propagate entirely the change to the doc and examples.

wontfix

Source

r0mainK

Most helpful comment

@LysandreJik merged the PR for the doc, however I just realized that I incorrectly assumed hte commit was part of 2.3 or 2.2.2, from the merge date of the uniformisation commit. It is currently only in the master branch but not in any tagged version, which means anyone that gets the above bug should switch to -1 until that is the case. Here is the error I got when training on GPU by the way:

/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: 
void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, 
Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float,
 Acctype = float]: block: [0,0,0], thread: [31,0,0] 
Assertion `t >= 0 && t < n_classes` failed.

r0mainK on 7 Jan 2020

👍2

All 5 comments

Okay my bad it seems this was actually intentional, this commit was passed and integrated in either version 2.2.2 or 2.3, causing the error on my version. It seems the current proper way to do this is indeed by specifying -100 as index.

The doc is unclear though, this sentence: Indices should be in [-1, 0, ..., config.vocab_size] should be Indices should be in [-100, 0, ..., config.vocab_size].

Anyway cheers, I PRed the documentation fix everywhere it's needed if you wanna have a look, but regardless feel free to close this issue.

r0mainK on 7 Jan 2020

/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:106: 
void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, 
Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float,
 Acctype = float]: block: [0,0,0], thread: [31,0,0] 
Assertion `t >= 0 && t < n_classes` failed.

r0mainK on 7 Jan 2020

👍2

Thanks for figuring this out!

This was a hair-pulling bug due to the fact that the conda package from the pytorch channel has the updated version while a pypi package with a release tag does not...I was wondering why indice masking for bert labels was having such issues in the conda version 1.3.1 and the pip version 1.3.1 (they're labeled as the same version D:)

aychang95 on 17 Jan 2020

😄1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 17 Mar 2020

Hello, thanks for sharing.
I also want to finetune the CamemBERT pretrained model on a MLM task for later extraction of sentence embedding then for clustering. I am a bit confused of how to use the Trainer to fine tune.
should I create by myself the masked_lm_labels with indice in [-100, 0, ..., config.vocab_size]? but how should I know which word is masked?
Could you share the piece of codes if it doesn't bother. Thank you in advance.