Transformers: Bug in Question Answering pipeline when question is weird (unanswerable)

Created on 7 Jul 2020  路  5Comments  路  Source: huggingface/transformers

馃悰 Bug

Information

Model I am using (Bert, XLNet ...): TFDistilBertForQuestionAnswering

Language I am using the model on (English, Chinese ...): English

The problem arises when using:

  • [x] the official example scripts: (give details below)
  • [ ] my own modified scripts: (give details below)

The tasks I am working on is:

  • [x] an official GLUE/SQUaD task: (give the name)
  • [ ] my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  • Code
from transformers import pipeline
qanlp = pipeline("question-answering", framework="tf")  # even PyTorch gives the error
qanlp(context="I am a company", question="When is the bill due?", handle_impossible_answer=True) # happens even without handle_impossible_answer
  • Error
KeyError                                  Traceback (most recent call last)
<ipython-input-4-4da7a3b5ca0e> in <module>()
----> 1 qanlp(context="I am a company", question="When is the bill due?", handle_impossible_answer=True)

/usr/local/lib/python3.6/dist-packages/transformers/pipelines.py in __call__(self, *args, **kwargs)
   1314                         ),
   1315                     }
-> 1316                     for s, e, score in zip(starts, ends, scores)
   1317                 ]
   1318 

/usr/local/lib/python3.6/dist-packages/transformers/pipelines.py in <listcomp>(.0)
   1314                         ),
   1315                     }
-> 1316                     for s, e, score in zip(starts, ends, scores)
   1317                 ]
   1318 

KeyError: 0

Expected behavior

Either give a wrong answer or a blank answer without any errors

Environment info

  • transformers version: 3.0.2
  • Platform: Linux-4.19.104+-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.5.1+cu101 (False)
  • Tensorflow version (GPU?): 2.2.0 (False)
  • Using GPU in script?: Tried with and without GPU in Google colab
  • Using distributed or parallel set-up in script?: No

Most helpful comment

@pavanchhatpar Can you try on the master branch? I just pushed a change that should fix the case when questions are not answerable see b716a864f869ddc78c3c8eb00729fc9546c74ee4.

Let us know if it resolve the issue 馃檹

All 5 comments

I believe https://github.com/huggingface/transformers/pull/5542 is trying to fix exactly this.

Yes, I see. The pipeline failing on that pull request shows the same error.
But, I think the fix would lie somewhere in squad feature creation because the feature.token_to_orig_map is what gives the KeyError. Took a diff between v2.11.0 and v3.0.2 for pipelines.py and here's what changed for the QuestionAnsweringPipeline

 class QuestionAnsweringArgumentHandler(ArgumentHandler):
@@ -1165,12 +1253,12 @@ class QuestionAnsweringPipeline(Pipeline):
         examples = self._args_parser(*args, **kwargs)
         features_list = [
             squad_convert_examples_to_features(
-                [example],
-                self.tokenizer,
-                kwargs["max_seq_len"],
-                kwargs["doc_stride"],
-                kwargs["max_question_len"],
-                False,
+                examples=[example],
+                tokenizer=self.tokenizer,
+                max_seq_length=kwargs["max_seq_len"],
+                doc_stride=kwargs["doc_stride"],
+                max_query_length=kwargs["max_question_len"],
+                is_training=False,
                 tqdm_enabled=False,
             )
             for example in examples
@@ -1184,33 +1272,34 @@ class QuestionAnsweringPipeline(Pipeline):
             with self.device_placement():
                 if self.framework == "tf":
                     fw_args = {k: tf.constant(v) for (k, v) in fw_args.items()}
-                    start, end = self.model(fw_args)
+                    start, end = self.model(fw_args)[:2]
                     start, end = start.numpy(), end.numpy()
                 else:
                     with torch.no_grad():
                         # Retrieve the score for the context tokens only (removing question tokens)
                         fw_args = {k: torch.tensor(v, device=self.device) for (k, v) in fw_args.items()}
-                        start, end = self.model(**fw_args)
+                        start, end = self.model(**fw_args)[:2]
                         start, end = start.cpu().numpy(), end.cpu().numpy()

             min_null_score = 1000000  # large and positive
             answers = []
             for (feature, start_, end_) in zip(features, start, end):
-                # Normalize logits and spans to retrieve the answer
-                start_ = np.exp(start_) / np.sum(np.exp(start_))
-                end_ = np.exp(end_) / np.sum(np.exp(end_))
-
                 # Mask padding and question
                 start_, end_ = (
                     start_ * np.abs(np.array(feature.p_mask) - 1),
                     end_ * np.abs(np.array(feature.p_mask) - 1),
                 )

+                # Mask CLS
+                start_[0] = end_[0] = 0
+
+                # Normalize logits and spans to retrieve the answer
+                start_ = np.exp(start_ - np.log(np.sum(np.exp(start_), axis=-1, keepdims=True)))
+                end_ = np.exp(end_ - np.log(np.sum(np.exp(end_), axis=-1, keepdims=True)))
+
                 if kwargs["handle_impossible_answer"]:
                     min_null_score = min(min_null_score, (start_[0] * end_[0]).item())

-                start_[0] = end_[0] = 0
-
                 starts, ends, scores = self.decode(start_, end_, kwargs["topk"], kwargs["max_answer_len"])
                 char_to_word = np.array(example.char_to_word_offset)

It doesn't seem like a lot of logic changed in this file except the position of # Mask CLS block and the pull request which fixes its position is not able to fix this bug yet.

@mfuntowicz, maybe you have some insights on this as you're working on the PR? :)

@pavanchhatpar Can you try on the master branch? I just pushed a change that should fix the case when questions are not answerable see b716a864f869ddc78c3c8eb00729fc9546c74ee4.

Let us know if it resolve the issue 馃檹

@mfuntowicz I tried with the master branch. Seems to work well now. Thanks for the fix!

Was this page helpful?
0 / 5 - 0 ratings