Model I am using (Bert, XLNet ...):
Camembert
Language I am using the model on (English, Chinese ...):
French
The problem arises when using:
The tasks I am working on is:
Steps to reproduce the behavior:
Initialisation :
bert = CamembertModel.from_pretrained("camembert-base")
bert_tok = CamembertTokenizer.from_pretrained("camembert-base")
Inference : like https://huggingface.co/transformers/usage.html#question-answering
inputs = bert_tok.encode_plus(question, context, add_special_tokens=True, return_tensors="pt")
input_ids = inputs["input_ids"].tolist()[0]
text_tokens = bert_tok.convert_ids_to_tokens(input_ids)
answer_start_scores, answer_end_scores = bert(**inputs)
It works by removing the context argument (text_pair argument) but I need it to do question answering with other models and it lead to the same error with pipelines
IndexError Traceback (most recent call last)
<ipython-input-9-73762e6cf69b> in <module>
2 for utterances in file.readlines():
3 input_tensor = bert_tok.batch_encode_plus([utterances], pad_to_max_length=True, return_tensors="pt")
----> 4 last_hidden, pool = bert(input_tensor["input_ids"], input_tensor["attention_mask"])
5
6
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 functools.update_wrapper(wrapper, hook)
549 grad_fn.register_hook(wrapper)
--> 550 return result
551
552 def __setstate__(self, state):
~/.local/lib/python3.8/site-packages/transformers/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask)
780 head_mask = [None] * self.config.num_hidden_layers
781
--> 782 embedding_output = self.embeddings(
783 input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
784 )
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 functools.update_wrapper(wrapper, hook)
549 grad_fn.register_hook(wrapper)
--> 550 return result
551
552 def __setstate__(self, state):
~/.local/lib/python3.8/site-packages/transformers/modeling_roberta.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds)
62 position_ids = self.create_position_ids_from_inputs_embeds(inputs_embeds)
63
---> 64 return super().forward(
65 input_ids, token_type_ids=token_type_ids, position_ids=position_ids, inputs_embeds=inputs_embeds
66 )
~/.local/lib/python3.8/site-packages/transformers/modeling_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds)
172 if inputs_embeds is None:
173 inputs_embeds = self.word_embeddings(input_ids)
--> 174 position_embeddings = self.position_embeddings(position_ids)
175 token_type_embeddings = self.token_type_embeddings(token_type_ids)
176
~/.local/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
548 functools.update_wrapper(wrapper, hook)
549 grad_fn.register_hook(wrapper)
--> 550 return result
551
552 def __setstate__(self, state):
~/.local/lib/python3.8/site-packages/torch/nn/modules/sparse.py in forward(self, input)
110
111 def forward(self, input):
--> 112 return F.embedding(
113 input, self.weight, self.padding_idx, self.max_norm,
114 self.norm_type, self.scale_grad_by_freq, self.sparse)
~/.local/lib/python3.8/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1722 if dim == 3:
1723 div = pad(div, (0, 0, size // 2, (size - 1) // 2))
-> 1724 div = avg_pool2d(div, (size, 1), stride=1).squeeze(1)
1725 else:
1726 sizes = input.size()
IndexError: index out of range in self
Run inference without any error
transformers version: 2.8.0
I am running into the same error on my own script. Interestingly it only appears on CPU... Did you find a solution?
No, I want to get a French Q&A pipeline, surprinsingly, with the hugging face pipeline everything works great, I can plug the code in a local server and make requests on it.
But when I try to use the same code in a docker envrionement to ship it, it fail with this error (only in french with camembert, classic bert works fine)
I get the error locally as well if I try not to use the hugging face pipeline but write my own inference (as described above)
I can confirm it's working on GPU local (and even in a docker) but still stuck on CPU
I actually figured out my error. I was adding special tokens to the tokenizer (like begin-of-sequence) but did not resize the models token embeddings via:
model.resize_token_embeddings(len(self.tokenizer))
Just in case someone else is not reading the documentation carefully enough :see_no_evil:
Considering that, the error message did actually make sense.
Hi @Ierezell, there is indeed an issue which I'm patching in #4289. Please be aware that you're using CamembertModel which cannot be used for question answering. Please use CamembertForQuestionAnswering instead.
It's patched now, please install from source and there should be no error anymore!
Hi @LysandreJik, I'm concious that I used it with a non QA model but it was to try the base model supported by hugging face.
I tried as well with illuin/camembert-base-fquad (large as well) and with fmikaelian/camembert-base-fquad
I will install the latest version and try it.
Thnaks a lot for the fast support !
I tried your fix but it lead to key errors :
File "/home/pedro/.local/lib/python3.8/site-packages/transformers/pipelines.py", line 1156, in __call__
answers += [
File "/home/pedro/.local/lib/python3.8/site-packages/transformers/pipelines.py", line 1159, in <listcomp>
"start": np.where(char_to_word == feature.token_to_orig_map[s])[0][0].item(),
KeyError: 0
Could you provide a reproducible script? I can't reproduce.
My problem here was surely linked with #4674 everything seems to work now, thanks a lot
Most helpful comment
I actually figured out my error. I was adding special tokens to the tokenizer (like begin-of-sequence) but did not resize the models token embeddings via:
model.resize_token_embeddings(len(self.tokenizer))Just in case someone else is not reading the documentation carefully enough :see_no_evil:
Considering that, the error message did actually make sense.