Allennlp: Is there any option for changing BucketIterator's batch size depending on train or valid data?

Created on 19 Dec 2019  路  5Comments  路  Source: allenai/allennlp

Question

Currently I'm working on entity linking. In evaluation of entity linking, I'd like to add more negatives than ones during training, even at the expense of batch size.

Here is pseudo code.

    iterator = BucketIterator(batch_size=opts.batch_size_for_train, sorting_keys=[('context', 'num_tokens')])
    iterator.index_with(vocab)
    model = EntityLinker(args=opts, word_embeddings=textfieldEmbedder, encoder=mention_encoder,type_out_sz=type_dict_length,vocab=vocab)
    model = model.cuda()
    optimizer = optimizerloader(opts, model)
    trainer = Trainer(model=model, optimizer=optimizer, iterator=iterator, 
              train_dataset=trains, valid_dataset=valids, num_epochs=opts.num_epochs)
    trainer.train()

In this code, batch size of train and valid dataset is same.
Due to gpu resource limitation, I can't run valid datasets evaluation after each epoch of training. (In the above code, trainer.train()), because each data point of valid data has more negatives, say, 10 times, than each train data point.

So My question is, is there any method for changing batch size depending on train or valid?
(During test evaluation, yes we can make another iterator and specify batch size to, say, 2 or small one.)

If you'd know solution about it , I'd appreciate it much.
Thanks.

All 5 comments

I'm not sure I follow why you want to do this. In my experience, the only reason to set a larger batch size during evaluation is performance. Since we don't compute gradients during evaluation, you can get away with a bigger batch size. Is that what you're trying to do?

If you just want more negatives during evaluation, why don't you just make sure that the validation set contains more negatives? You can set a different dataset reader for training and validation, and that's where this difference would be configured.

The trainer lets you specify different iterators for train and validation, I think.

The trainer lets you specify different iterators for train and validation, I think.

It does, though I guess you can't configure that with configuration files, just in code?

Check out https://github.com/allenai/allennlp/blob/master/allennlp/training/trainer.py#L768, which will let you choose a different iterator for training and validation.

I don't know why I forgot this. Yes preparing another iterator for valid dataset solves this situation.
Thanks for your advices.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tokestermw picture tokestermw  路  4Comments

masashi-y picture masashi-y  路  4Comments

stefan-it picture stefan-it  路  4Comments

epwalsh picture epwalsh  路  4Comments

shounakpaul95 picture shounakpaul95  路  4Comments