Transformers: BERT output not deterministic

Created on 18 Jun 2019 · 3Comments · Source: huggingface/transformers

BERT output is not deterministic.
I expect the output values are deterministic when I put a same input, but my bert model the values are changing. Sounds awkwardly, the same value is returned twice, once. That is, once another value comes out, the same value comes out and it repeats.
How I can make the output deterministic?
let me show snippets of my code.
I use the model as below.

        tokenizer = BertTokenizer.from_pretrained(self.bert_type, do_lower_case=self.do_lower_case, cache_dir=self.bert_cache_path)
        pretrain_bert = BertModel.from_pretrained(self.bert_type, cache_dir=self.bert_cache_path)
        bert_config = pretrain_bert.config

Get the output like this

        all_encoder_layer, pooled_output = self.model_bert(all_input_ids, all_segment_ids, all_input_mask)

        # all_encoder_layer: BERT outputs from all layers.
        # pooled_output: output of [CLS] vec.

pooled_output

tensor([[-3.3997e-01,  2.6870e-01, -2.8109e-01, -2.0018e-01, -8.6849e-02,

tensor([[ 7.4340e-02, -3.4894e-03, -4.9583e-03,  6.0806e-02,  8.5685e-02,

tensor([[-3.3997e-01,  2.6870e-01, -2.8109e-01, -2.0018e-01, -8.6849e-02,

tensor([[ 7.4340e-02, -3.4894e-03, -4.9583e-03,  6.0806e-02,  8.5685e-02,
````

for the all encoder layer, the situation is same, - same in twice an once.

I extract word embedding feature from the bert, and the situation is same.

wemb_n
tensor([[[ 0.1623, 0.4293, 0.1031, ..., -0.0434, -0.5156, -1.0220],

tensor([[[ 0.0389, 0.5050, 0.1327, ..., 0.3232, 0.2232, -0.5383],

tensor([[[ 0.1623, 0.4293, 0.1031, ..., -0.0434, -0.5156, -1.0220],

tensor([[[ 0.0389, 0.5050, 0.1327, ..., 0.3232, 0.2232, -0.5383],
```

wontfix

Source

yspaik

Most helpful comment

As with all the other issues about Bert being not deterministic (#403, #679, #432, #475, #265, #278), it's likely because you didn't set the model in eval mode to desactivate the DropOut modules: model.eval().

I will try to emphasize this more in the examples of the readme because this issue keeps being raised.

thomwolf on 18 Jun 2019

👍4

All 3 comments

I will try to emphasize this more in the examples of the readme because this issue keeps being raised.

thomwolf on 18 Jun 2019

👍4

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] on 17 Aug 2019

Epoch 1/6
loss: 2.0674 - bert_loss: 1.0283 - bert_1_loss: 1.0390 - bert_accuracy: 0.6604 - bert_1_accuracy: 0.6650

Epoch 2/6
loss: 1.7190 - bert_loss: 0.8604 - bert_1_loss: 0.8586 - bert_accuracy: 0.7000 - bert_1_accuracy: 0.7081

Epoch 3/6
loss: 1.5244 - bert_loss: 0.7715 - bert_1_loss: 0.7528 - bert_accuracy: 0.7250 - bert_1_accuracy: 0.7424

Epoch 4/6
loss: 1.3203 - bert_loss: 0.6765 - bert_1_loss: 0.6438 - bert_accuracy: 0.7585 - bert_1_accuracy: 0.7741

Epoch 5/6
loss: 1.1102 - bert_loss: 0.5698 - bert_1_loss: 0.5404 - bert_accuracy: 0.7936 - bert_1_accuracy:
0.8082 - val_loss: 0.7052 - val_bert_loss: 0.3709 - val_bert_1_loss: 0.3343 - val_bert_accuracy: 0.8687 - val_bert_1_accuracy: 0.8803
Epoch 6/6
ETA: 0s - loss: 0.9269 - bert_loss: 0.4823 - bert_1_loss: 0.4446 - bert_accuracy: 0.8287 - bert_1_accuracy: 0.8452
bert_loss: 0.4823 - bert_1_loss: 0.4446 - bert_accuracy: 0.8287 - bert_1_accuracy: 0.8452`

I have the same problem in tensorflow and I configured the model in order to consider Dropout only during the training phase (training=True). But I still have random outputs after each prediction.

As you can see during the training phase performance gets better so I guess that the problem is on the prediction