Transformers: BERT add_token function not modify bias size

Created on 9 Jan 2020 · 6Comments · Source: huggingface/transformers

🐛 Bug

Model I am using (Bert, XLNet....): Bert

Language I am using the model on (English, Chinese....): English

The problem arise when using:

the official example scripts: modeling_bert.py

The tasks I am working on is:

my own task or dataset: fine-tuning Bert with added new tokens to vocabulary

To Reproduce

Steps to reproduce the behavior:

Running "run_lm_finetuning.py" with added tokens to vocabulary.

new_vocab_list = ['token_1', 'token_2', 'token_3']
tokenizer.add_tokens(new_vocab_list)
logger.info("vocabulary size after adding: " + str(len(tokenizer)))
model.resize_token_embeddings(len(tokenizer))
logger.info("size of model.cls.predictions.bias: " + str(len(model.cls.predictions.bias)))

Expected behavior

The result should be:
vocabulary size after adding: 31119
size of model.cls.predictions.bias: 31119
But actually the result is:
vocabulary size after adding: 31119
size of model.cls.predictions.bias: 31116

Environment

OS: Ubuntu
Python version: 3.6
PyTorch version: 1.3.1
PyTorch Transformers version (or branch): 2.2.1
Using GPU: yes
Distributed or parallel setup: no

Additional context

I have found the problem to be: for BERT model, the class "BertLMPredictionHead" has two separate attributes "decoder" and "bias". When adding new tokens, the code "model.resize_token_embeddings(len(tokenizer))" only updates the size of "decoder" and its bias if it has (this bias is different from the "BertLMPredictionHead.bias"). The attribute "BertLMPredictionHead.bias" is not updated and therefore, causes the error.

I have added the updating-bias code in my "modeling_bert.py". And if you want, I can merge my branch to your code. However, if I misunderstand something, please notice me too.

Thank you very much for your code base.

wontfix

Source

HuyVu0508

Most helpful comment

Hi, I've pushed a fix that was just merged in master. Could you please try and install from source:

pip install git+https://github.com/huggingface/transformers

and tell me if you face the same error?

LysandreJik on 14 Jan 2020

👍3

All 6 comments

Hi, I've pushed a fix that was just merged in master. Could you please try and install from source:

pip install git+https://github.com/huggingface/transformers

and tell me if you face the same error?

LysandreJik on 14 Jan 2020

👍3

Having follow your reply from here (https://github.com/huggingface/transformers/issues/2513#issuecomment-574406370) it now works :)

Needed to update run_lm_finetuning.py to latest github branch - thanks :)

emillykkejensen on 15 Jan 2020

Hi @LysandreJik . Thank you for the update but the error has not been solved I'm afraid. Following are the error returned:

  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/transformers/modeling_bert.py", line 889, in forward
    prediction_scores = self.cls(sequence_output)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/transformers/modeling_bert.py", line 461, in forward
    prediction_scores = self.predictions(sequence_output)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/sdcc/u/hvu/.conda/envs/torch/lib/python3.6/site-packages/transformers/modeling_bert.py", line 451, in forward
    hidden_states = self.decoder(hidden_states) + self.bias
RuntimeError: The size of tensor a (31119) must match the size of tensor b (31116) at non-singleton dimension 2

I have solved the problem myself by implementing this piece of code in the method def _tie_or_clone_weights(self, output_embeddings, input_embeddings) in _modeling_utils.py_:

        # Update bias size if has attribuate bias 
        if hasattr(self, "cls"):
            self.cls.predictions.bias.data = torch.nn.functional.pad(
                self.cls.predictions.bias.data,
                (0, self.config.vocab_size - self.cls.predictions.bias.shape[0]),
                "constant",
                0,
            )

HuyVu0508 on 15 Jan 2020

@HuyVu0508 Try update this file

https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bert.py

It should be somewhere "/opt/conda/lib/python3.6/site-packages/transformers/modeling_bert.py"

jasonwu0731 on 16 Jan 2020

Looks like this is probably a duplicate of #1730

Also, there is a temp solution posted here.
https://github.com/huggingface/transformers/issues/1730#issuecomment-550081307

praateekmahajan on 16 Jan 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.