Allennlp: Is there any option for changing BucketIterator's batch size depending on train or valid data?

Created on 19 Dec 2019 · 5Comments · Source: allenai/allennlp

Question

Currently I'm working on entity linking. In evaluation of entity linking, I'd like to add more negatives than ones during training, even at the expense of batch size.

Here is pseudo code.

    iterator = BucketIterator(batch_size=opts.batch_size_for_train, sorting_keys=[('context', 'num_tokens')])
    iterator.index_with(vocab)
    model = EntityLinker(args=opts, word_embeddings=textfieldEmbedder, encoder=mention_encoder,type_out_sz=type_dict_length,vocab=vocab)
    model = model.cuda()
    optimizer = optimizerloader(opts, model)
    trainer = Trainer(model=model, optimizer=optimizer, iterator=iterator, 
              train_dataset=trains, valid_dataset=valids, num_epochs=opts.num_epochs)
    trainer.train()

In this code, batch size of train and valid dataset is same.
Due to gpu resource limitation, I can't run valid datasets evaluation after each epoch of training. (In the above code, trainer.train()), because each data point of valid data has more negatives, say, 10 times, than each train data point.

So My question is, is there any method for changing batch size depending on train or valid?
(During test evaluation, yes we can make another iterator and specify batch size to, say, 2 or small one.)

If you'd know solution about it , I'd appreciate it much.
Thanks.

Source

izuna385

All 5 comments

I'm not sure I follow why you want to do this. In my experience, the only reason to set a larger batch size during evaluation is performance. Since we don't compute gradients during evaluation, you can get away with a bigger batch size. Is that what you're trying to do?

If you just want more negatives during evaluation, why don't you just make sure that the validation set contains more negatives? You can set a different dataset reader for training and validation, and that's where this difference would be configured.