Transformers: Fine tuning for evaluation

Created on 29 Jan 2019 · 9Comments · Source: huggingface/transformers

Hi!
1) Help me please figure out, what would be optimal batch size for evaluating nextSentencePrediction model? For performance. Is it same as used during pre-training (128)?
2) If i building high performance evaluating backend on CUDA, would it be a good idea to use several threads with bert model in each, or its better to use one thread with proper batching?

Source

Alexadar

Most helpful comment

You increase it gradually and when the program crashes, it is too big ^^.

rodgzilla on 29 Jan 2019

😄3

All 9 comments

For evaluation I would advise the maximum batch size that your GPU allows. You will be able to use more efficiently this way.
I think you will be better off by using a single thread.

rodgzilla on 29 Jan 2019

Thanks! How can i figure out optimal batch size? I want to try tesla k80

Alexadar on 29 Jan 2019

You increase it gradually and when the program crashes, it is too big ^^.

rodgzilla on 29 Jan 2019

😄3

Thanks!

Alexadar on 29 Jan 2019

Guys, sorry i reopen this issue, but it might be helpful and on topic of evaluation
I want to load batch of data into model for evaluation. Batch have size of 16 sentences of different length
Code:

tokens_tensor = torch.tensor(indexed_tokens)
segments_tensors = torch.tensor(segments_ids)
predictions = model(tokens_tensor, segments_tensors)

indexed_tokens are array of size 16 of arrays of inputs.
I got error
ValueError: expected sequence of length 121 at dim 1 (got 23)

when i create tensor from a single element
tokens_tensor = torch.tensor([indexed_tokens[0]])
it works

What im doing wrong?
Thanks!

Alexadar on 29 Jan 2019

Could you create of minimal program that reproduces your problem (with the code you are using to generate indexed_tokens)?

rodgzilla on 30 Jan 2019

Tensor Input array should have same length for all rows. My sentences had various length. That's why pytorch raise exception
If you add zeros to the end of input arrays, to make all rows equal, evaluation will be slower than one per sentence. Batching not improving speed.

Alexadar on 30 Jan 2019

Hi @Alexadar, you have to batch your examples and pad them indeed. No other way I'm afraid.