Transformers: Fine tuning for evaluation

Created on 29 Jan 2019  路  9Comments  路  Source: huggingface/transformers

Hi!
1) Help me please figure out, what would be optimal batch size for evaluating nextSentencePrediction model? For performance. Is it same as used during pre-training (128)?
2) If i building high performance evaluating backend on CUDA, would it be a good idea to use several threads with bert model in each, or its better to use one thread with proper batching?

Most helpful comment

You increase it gradually and when the program crashes, it is too big ^^.

All 9 comments

  1. For evaluation I would advise the maximum batch size that your GPU allows. You will be able to use more efficiently this way.

  2. I think you will be better off by using a single thread.

Thanks! How can i figure out optimal batch size? I want to try tesla k80

You increase it gradually and when the program crashes, it is too big ^^.

Thanks!

Guys, sorry i reopen this issue, but it might be helpful and on topic of evaluation
I want to load batch of data into model for evaluation. Batch have size of 16 sentences of different length
Code:

tokens_tensor = torch.tensor(indexed_tokens)
segments_tensors = torch.tensor(segments_ids)
predictions = model(tokens_tensor, segments_tensors)

indexed_tokens are array of size 16 of arrays of inputs.
I got error
ValueError: expected sequence of length 121 at dim 1 (got 23)

when i create tensor from a single element
tokens_tensor = torch.tensor([indexed_tokens[0]])
it works

What im doing wrong?
Thanks!

Could you create of minimal program that reproduces your problem (with the code you are using to generate indexed_tokens)?

  1. Tensor Input array should have same length for all rows. My sentences had various length. That's why pytorch raise exception
  2. If you add zeros to the end of input arrays, to make all rows equal, evaluation will be slower than one per sentence. Batching not improving speed.

Hi @Alexadar, you have to batch your examples and pad them indeed. No other way I'm afraid.

Sorry, i missed your post request for example.
Yes, padding is only way to batch. It is slower than process sentencess one by one, i tested on GPU.

Was this page helpful?
0 / 5 - 0 ratings